A chatbot might not break a sweat every time you ask it to make your shopping list or come up with its best dad jokes. But over time, the planet might.
As generative AI such as large language models (LLMS) becomes more ubiquitous, critical questions loom. For every interaction you have with AI, how much energy does it take — and how much carbon is emitted into the atmosphere?
Earlier this month, OpenAI CEO Sam Altman claimed that an “average ChatGPT query” uses energy equal to “about what an oven would use in a little over one second.” That’s within the realm of reason: AI research firm Epoch AI previously calculated a similar estimate. However, experts say the claim lacks key context, like what an “average” query even is.
“If you wanted to be rigorous about it, you would have to give a range,” says Sasha Luccioni, an AI researcher and climate lead at the AI firm Hugging Face. “You can’t just throw a number out there.”
Major players including OpenAI and Anthropic have the data, but they’re not sharing it. Instead, researchers can only piece together limited clues from open-source LLMs. One study published June 19 in Frontiers in Communication examined 14 such models, including those from Meta and DeepSeek, and found that some models produced up to 50 times more CO₂ emissions than others.
But these numbers merely offer a narrow snapshot — and they only get more dire after factoring in the carbon cost of training models, manufacturing and maintaining the hardware to run them and the scale at which generative AI is poised to permeate our daily lives.
“Machine learning research has been driven by accuracy and performance,” says Mosharaf Chowdhury, a computer scientist at the University of Michigan in Ann Arbor. “Energy has been the middle child that nobody wants to talk about.”
Science News spoke with four experts to unpack these hidden costs and what they mean for AI’s future.
What makes large language models so energy-hungry?
You’ll often hear people describe LLMs by the number of parameters they have. Parameters are the internal knobs the model adjusts during training to improve its performance. The more parameters, the more capacity the model has to learn patterns and relationships in data. GPT-4, for example, is estimated to have over a trillion parameters.
“If you want to learn all the knowledge of the world, you need bigger and bigger models,” MIT computer scientist Noman Bashir says.
Models like these don’t run on your laptop. Instead, they’re deployed in massive data centers located across the world. In each center, the models are loaded on servers containing powerful chips called graphics processing units (GPUs), which do the number crunching needed to generate helpful outputs. The more parameters a model has, generally the more chips are needed to run it — especially to get users the fastest response possible.
All of this takes energy. Already, 4.4 percent of all energy in the U.S. goes toward data centers used for a variety of tech demands, including AI. By 2028, this number is projected to grow to up to 12 percent.
Why is it so difficult to measure the carbon footprint of LLMs?
Before anyone can ask a model a question, it must first be trained. During training, a model digests vast datasets and adjusts its internal parameters accordingly. It often takes weeks and thousands of GPUs, burning an enormous amount of energy. But since companies rarely disclose their training methods — what data they used, how much compute time or what kind of energy powered it — the emissions from this process are largely a black box.
The second half of the model’s life cycle is inference, which happens every time a user prompts the model. Over time, inference is expected to account for the bulk of a model’s emissions. “You train a model once, then billions of users are using the model so many times,” Chowdhury says.
But inference, too, is difficult to quantify. The environmental impact of a single query can vary dramatically depending on which data center it’s routed to, which energy grid powers the data center and even the time of day. Ultimately, only the companies running these models have a complete picture.
Is there any way to estimate an LLM’s energy use?
For training, not really. For inference, kind of.
OpenAI and Anthropic keep their models proprietary, but other companies such as Meta and DeepSeek release open-source versions of their AI products. Researchers can run these models locally and measure the energy consumed by their GPU as a proxy for how much energy inference would take.
In their new study, Maximilian Dauner and Gudrun Socher at Munich University of Applied Sciences in Germany tested 14 open-source AI models, ranging from 7 billion to 72 billion parameters (those internal knobs), on the NVIDIA A100 GPU. Reasoning models, which explain their thinking step by step, consumed far more energy during inference than standard models, which directly output the answer.
The reason comes down to tokens, or the bits of text a model processes to generate a response. More tokens mean more computation and higher energy use. On average, reasoning models used 543.5 tokens per question, compared to just 37.7 for standard models. At scale, the queries add up: Using the 70-parameter reasoning model DeepSeek R1 to answer 600,000 questions would emit as much CO₂ as a round-trip flight from London to New York.
In reality, the numbers can only be higher. Many companies have switched over to Nvidia’s newer H100, a chip specifically optimized for AI workloads that’s even more power-hungry than the A100. To more accurately reflect the total energy used during inference — including cooling systems and other supporting hardware — previous research has found that reported GPU energy consumption needs to be doubled.
Even still, none of that accounts for the emissions generated from manufacturing the hardware and constructing the buildings that house it, what’s known as embodied carbon, Bashir points out.
What can people do to make their AI usage more environmentally friendly?
Choosing the right model for each task makes a difference. “Is it always needed to use the biggest model for easy questions?” Dauner asks. “Or can a small model also answer easy questions, and we can reduce CO₂ emissions based on that?”
Similarly, not every question needs a reasoning model. For example, Dauner’s study found that the standard model Qwen 2.5 achieved comparable accuracy to the reasoning model Cogito 70B, but with less than a third of the carbon production.
Researchers have created other public tools to measure and compare AI energy use. Hugging Face runs a leaderboard called AI Energy Score, which ranks models based on how much energy they use across 10 different tasks from text generation to image classification to voice transcription. It includes both open source and proprietary models. The idea is to help people choose the most efficient model for a given job, finding that “golden spot” between performance, accuracy and energy efficiency.
Chowdhury also helps run ML.Energy, which has a similar leaderboard. “You can save a lot of energy by giving up a tiny bit of performance,” Chowdhury says.
Using AI less frequently during the daytime or summer, when power demand spikes and cooling systems work overtime, can also make a difference. “It’s similar to AC,” Bashir says. “If the outside temperature is very high, you would need more energy to cool down the inside of the house.”
Even the way you phrase your queries matters. Environmentally speaking, there’s no need to be polite to the chatbot. Any extra input you put in takes more processing power to parse. “It costs millions of [extra] dollars because of ‘thank you’ and ‘please,’” Dauner says. “Every unnecessary word has an influence on the run time.”
Ultimately, however, policy must catch up. Luccioni suggests a framework based on an energy rating system, like those used for household appliances. For example, “if your model is being used by, say, 10 million users a day or more, it has to have an energy score of B+ or higher,” she says.
Otherwise, energy supply won’t be able to sustain AI’s growing demand. “I go to conferences where grid operators are freaking out,” Luccioni says. “Tech companies can’t just keep doing this. Things are going to start going south.”
Read the full article here