Queries requiring AI chatbots like OpenAI’s ChatGPT to think logically and reason produce more carbon emissions than other types of questions, according to a new study.
Every query typed into a large language model like ChatGPT requires energy and leads to carbon dioxide emissions. The emission levels depend on the chatbot, the user, and the subject matter, researchers at Germany’s Hochschule München University of Applied Sciences say.
The study, published in the journal Frontiers, compares 14 AI models and finds that answers requiring complex reasoning cause more carbon emissions than simple answers.
Queries needing lengthy reasoning, like abstract algebra or philosophy, cause up to six times greater emissions than more straightforward subjects like high school history.
Researchers recommend that frequent users of AI chatbots adjust the kind of questions they pose to limit carbon emissions.
The study assesses as many as 14 LLMs on 1,000 standardised questions across subjects to compare their carbon emissions.
“The environmental impact of questioning trained LLMs is strongly determined by their reasoning approach, with explicit reasoning processes significantly driving up energy consumption and carbon emissions," study author Maximilian Dauner says.
“We found that reasoning-enabled models produced up to 50 times more carbon dioxide emissions than concise response models.”
When a user puts a question to an AI chatbot, words or parts of words in the query are converted into a string of numbers and processed by the model. This conversion and other computing processes of the AI produce carbon emissions.
The study notes that reasoning models on average create 543.5 tokens per question while concise models require only 40.
“A higher token footprint always means higher CO2 emissions,” it says.
For instance, one of the most accurate models is Cogito which reaches about 85 per cent accuracy. It produces three times more carbon emissions than similarly sized models that provide concise answers.
"Currently, we see a clear accuracy-sustainability trade-off inherent in LLM technologies," Dr Dauner says. "None of the models that kept emissions below 500 grams of carbon dioxide equivalent achieved higher than 80 per cent accuracy on answering the 1,000 questions correctly.”
Carbon dioxide equivalent is a unit for measuring the climate change impact of various greenhouse gases.
Researchers hope the new findings will cause people to make more informed decisions about their AI use.
Citing an example, researchers say queries seeking DeepSeek R1 chatbot to answer 600,000 questions may create carbon emissions equal to a round-trip flight from London to New York.
In comparison, Alibaba Cloud’s Qwen 2.5 can answer more than three times as many questions with similar accuracy rates while generating the same emissions.
"Users can significantly reduce emissions by prompting AI to generate concise answers or limiting the use of high-capacity models to tasks that genuinely require that power," Dr Dauner says.