As large language models continue to grow, running them efficiently on small edge devices remains one of AI’s biggest challenges. This paper explores how LLM inference can be decentralized across a network of energy-harvesting devices each powered by intermittent renewable sources while still delivering reliable performance. To capture the behavior of these devices, the authors introduce a semi-Markov model that represents the battery level, job queue, and processing mode of each node. Based on this model, they develop lightweight scheduling strategies that decide which device should process each inference step. Three methods are evaluated:

Uniform scheduling, which assigns tasks randomly,
Long-term model-based scheduling, which relies on average energy patterns, and
Adaptive scheduling, which responds instantly to real-time energy arrivals.

Results show that energy-aware scheduling significantly reduces device inactivity and increases the overall job throughput, particularly when energy is scarce. The adaptive strategy performs best, maintaining high responsiveness even under unpredictable energy conditions. Together, these findings highlight a promising direction: by combining decentralized inference, energy harvesting, and adaptive resource management, advanced AI models can be deployed sustainably at the edge without relying on power-hungry data centers.

<br />

Decentralized LLM Inference over Edge Networks with Energy Harvesting