Every token your application generates burns real electricity on a GPU somewhere. This calculator turns monthly token volume into estimated compute hours, an electricity bill, and a CO2-equivalent footprint so you can budget AI spend in both dollars and carbon.
How it works
The model is deliberately simple and transparent:
- Energy per token. Each GPU has a typical power draw (kW) and a sustained inference throughput (tokens/second). Dividing power by throughput gives joules per token, which converts to kWh per token.
- GPU energy. Multiply energy per token by your monthly token volume.
- Grid energy. Multiply by a PUE factor to add data-center overhead (cooling, power distribution).
- Cost and carbon. Multiply grid energy by your electricity price for cost, and by your grid’s carbon intensity for emissions.
Defaults reflect published specs: an A100 draws ~0.4 kW and serves on the order of 1,500 tokens/s for a mid-size model; an H100 draws ~0.7 kW but sustains roughly 3,000 tokens/s, making it more energy-efficient per token despite higher peak power.
Worked example
Serving 500 million tokens/month on H100s at $0.15/kWh, grid 0.25 kg CO2/kWh, PUE 1.2:
- Energy/token: 0.7 kW ÷ 3,000 tok/s ≈ 6.5e-8 kWh/token
- GPU energy: ~32.4 kWh → with PUE 1.2 ≈ 38.9 kWh
- Cost: ≈ $5.83/month
- Carbon: ≈ 9.7 kg CO2/month
That is roughly the emissions of driving a petrol car 50 to 80 km — small per app, but material at fleet scale.
Tips
- H100s usually win on energy-per-token despite higher wattage; throughput matters more than peak draw.
- Cleaner cloud regions can cut your reported footprint by 5 to 10× at no code cost — schedule batch jobs where the grid is greenest.
- Combine with the LLM API Cost Calculator to see dollar and carbon budgets together.