How is energy per token estimated?

The tool divides the GPU's typical power draw (in kW) by its sustained inference throughput (tokens per second) to get joules per token, then converts to kWh. This is a server-level approximation that ignores cluster overhead, so add a margin for cooling and networking.

Why include a PUE multiplier?

Power Usage Effectiveness (PUE) accounts for data-center overhead like cooling and power distribution. A PUE of 1.2 means every 1 kWh of GPU power needs 1.2 kWh from the grid. Hyperscale centers run near 1.1 to 1.2; older facilities are higher.

Are these numbers exact?

No. Real energy use depends on batch size, sequence length, model size, quantization and utilization. These are order-of-magnitude planning estimates based on published GPU specs and typical inference throughput, useful for budgeting and ESG reporting, not metering.

What carbon intensity should I use?

Use your region's grid factor. Roughly 0.05 kg CO2/kWh for very clean grids (Norway, France nuclear), 0.2 to 0.4 for mixed grids (EU average, UK), and 0.5 to 0.8 for coal-heavy grids. Cloud regions publish their own factors.

Does this apply if I only call an API?

Yes, conceptually. Even when you do not own the GPUs, your token volume drives real energy use upstream. The estimate lets you attribute an approximate footprint to your application for carbon-aware reporting.

LLM Energy Cost Calculator

Every token your application generates burns real electricity on a GPU somewhere. This calculator turns monthly token volume into estimated compute hours, an electricity bill, and a CO2-equivalent footprint so you can budget AI spend in both dollars and carbon.

How it works

The model is deliberately simple and transparent:

Energy per token. Each GPU has a typical power draw (kW) and a sustained inference throughput (tokens/second). Dividing power by throughput gives joules per token, which converts to kWh per token.
GPU energy. Multiply energy per token by your monthly token volume.
Grid energy. Multiply by a PUE factor to add data-center overhead (cooling, power distribution).
Cost and carbon. Multiply grid energy by your electricity price for cost, and by your grid’s carbon intensity for emissions.

Defaults reflect published specs: an A100 draws ~0.4 kW and serves on the order of 1,500 tokens/s for a mid-size model; an H100 draws ~0.7 kW but sustains roughly 3,000 tokens/s, making it more energy-efficient per token despite higher peak power.

Worked example

Serving 500 million tokens/month on H100s at $0.15/kWh, grid 0.25 kg CO2/kWh, PUE 1.2:

Energy/token: 0.7 kW ÷ 3,000 tok/s ≈ 6.5e-8 kWh/token
GPU energy: ~32.4 kWh → with PUE 1.2 ≈ 38.9 kWh
Cost: ≈ $5.83/month
Carbon: ≈ 9.7 kg CO2/month

That is roughly the emissions of driving a petrol car 50 to 80 km — small per app, but material at fleet scale.

Tips

H100s usually win on energy-per-token despite higher wattage; throughput matters more than peak draw.
Cleaner cloud regions can cut your reported footprint by 5 to 10× at no code cost — schedule batch jobs where the grid is greenest.
Combine with the LLM API Cost Calculator to see dollar and carbon budgets together.