Whether we embrace them or not,AI Models large language models (LLMs) have quickly become woven into the fabric of daily life—powering everything from chatbots and virtual assistants to search engines and productivity tools. But behind their convenience lies a sobering environmental cost: energy-hungry models may be accelerating our path toward climate catastrophe.
A recent study published in Frontiers in Communication reveals a stark disparity in emissions between different LLMs. Some models generate up to 50 times more carbon emissions than others when responding to the same question—raising serious concerns about the sustainability of AI.
Read More: Measles Surge in 34 States: Essential Tips to Stay Safe During the Outbreak
Emissions Vary Widely Among Models
The research, led by a team at Hochschule München University of Applied Sciences in Germany, evaluated 14 prominent LLMs ranging from 7 to 72 billion parameters. Using 1,000 benchmark questions covering various subjects, the study examined the models’ energy usage and corresponding emissions.
The results? Models that relied heavily on internal reasoning processes emitted drastically more carbon dioxide than those that produced more concise answers. GPT-3.5, for example, is a relatively concise model. In contrast, reasoning-heavy models like GPT-4o consume significantly more energy due to their complex internal computations.
Thinking Tokens: The Culprit Behind the Carbon
At the heart of this discrepancy lies a seemingly abstract concept: tokens. LLMs convert user prompts into tokens—numerical representations of text. Reasoning models generate hundreds of additional “thinking tokens” per question to conduct deeper internal analysis before forming a response.
On average, reasoning models created 543.5 thinking tokens per question, compared to just 37.7 tokens for concise models. This ballooning computation translates directly into higher energy use—and, consequently, more CO₂ emissions.
Accuracy Comes at an Environmental Price
The study confirmed a troubling trend: the more accurate a model is, the greater its environmental impact. One high-performing model, Cogito (with 70 billion parameters), reached 84.9% accuracy—but emitted three times more CO₂ than equally large models offering shorter, less precise answers.
“We found that reasoning-enabled models produced up to 50 times more CO₂ emissions than concise response models,” said lead author Maximilian Dauner in a statement. “Currently, we see a clear accuracy-sustainability trade-off inherent in LLM technologies.”
None of the models that kept emissions below 500 grams of CO₂ equivalent reached above 80% accuracy on the 1,000-question benchmark.
Complexity of Subject Matter Amplifies Emissions
The environmental footprint of a single query also depends heavily on the topic. Subjects like abstract algebra or philosophy, which require layered reasoning, led to six times higher emissions than straightforward factual queries.
This reveals yet another layer of complexity: even the same model can vary in efficiency depending on the nature of the question.
Caveats and Call to Action
It’s important to note that emissions data depends heavily on factors like local energy grid mix, model architecture, and hardware efficiency. The authors caution against overgeneralizing—but still advocate for more thoughtful AI usage.
“Users can significantly reduce emissions by prompting AI to generate concise answers or limiting the use of high-capacity models to tasks that genuinely require that power,” Dauner said.
Frequently Asked Questions
Why do some AI models emit more CO₂ than others?
Different models vary in architecture, size (number of parameters), and how they process queries. More advanced models often engage in internal reasoning using additional “thinking tokens,” which significantly increases computational effort—and thus energy consumption and emissions.
What are “thinking tokens”?
Thinking tokens are internal tokens generated by some reasoning-based models to perform deeper, multi-step analysis before producing an output. While they improve accuracy, they also require more energy and lead to higher emissions.
Are more accurate AI models always worse for the environment?
Not always, but there is often a trade-off. According to the study, models that achieve higher accuracy tend to consume more energy, as they perform more computations per query. This leads to increased carbon emissions, especially in tasks that require complex reasoning.
How much more pollution can a reasoning model produce?
The study found that reasoning-enabled models can emit up to 50 times more CO₂ per question than concise models. That’s a significant difference for users and companies scaling AI usage.
Does the topic of the question matter?
Yes. Questions involving subjects like philosophy, mathematics, or other abstract reasoning tend to result in higher emissions, as they demand more complex processing than straightforward factual queries.
Are these findings applicable to all LLMs and regions?
Not exactly. Emissions can vary based on the energy efficiency of data centers, the type of hardware used, and the local electricity grid (renewables vs fossil fuels). However, the trends in the study highlight consistent patterns across multiple models.
Conclusion
As large language models become more integrated into our daily lives, it’s essential to recognize that every AI interaction carries an environmental cost. The latest research reveals a significant and often overlooked truth: not all models are created equal when it comes to sustainability.
Reasoning-enabled models may offer greater accuracy and depth, but they do so at a steep environmental price—sometimes generating up to 50 times more carbon emissions than their concise counterparts. This accuracy-sustainability trade-off challenges us to rethink how, when, and why we use AI.