What formula does this use?

Energy in kWh equals chip power in kW times number of chips times hours times PUE. Emissions equal that energy times the grid carbon intensity. This mirrors the methodology in Strubell et al. (2019) and Patterson et al. (2021), which both multiply hardware power draw by run time, PUE, and grid intensity.

What is PUE and what value should I use?

Power Usage Effectiveness is the ratio of total facility energy to the energy delivered to the IT equipment. A perfect data centre is 1.0; the global average is around 1.55, while best-in-class hyperscale facilities reach 1.1. Use your provider's reported figure if you have it.

Where do I get grid carbon intensity?

Grid intensity is grams of CO2e per kWh and varies hugely by region and time. Coal-heavy grids exceed 600 g/kWh, the global average is around 480 g/kWh, and low-carbon grids like France, Sweden, or Quebec can be under 60 g/kWh. Tools like Electricity Maps publish live figures.

How accurate is the estimate?

It is a first-order estimate. Real consumption depends on chip utilisation, memory and networking overhead, idle time, and embodied hardware emissions, none of which are captured here. Treat the number as an order-of-magnitude guide, not a certified figure for reporting.

Why benchmark against GPT-3 and BERT?

Published estimates give a sense of scale: training GPT-3 is estimated at roughly 552 tonnes CO2e and the original BERT at roughly 0.65 tonnes (about the same as a few transatlantic flights). Comparing your run to these anchors makes an abstract kilogram figure intuitive.

What is the AI Model Training Carbon Estimator?

Enter your accelerator type, chip count, training hours, data-centre PUE, and grid carbon intensity to estimate training CO2e using the Strubell and Patterson methodology. Benchmarks the result against GPT-3 and BERT training footprints. For ML engineers and AI-ethics teams. Runs in your browser. It runs free in your browser on Gera Tools, with nothing uploaded.

AI Model Training Carbon Estimator

Name: AI Model Training Carbon Estimator
Creator: Gera Tools
License: https://creativecommons.org/licenses/by/4.0/

Training large models consumes a lot of electricity, and the resulting carbon depends heavily on where and when you train. This estimator turns your hardware and run details into a CO2e figure using the same multiply-through approach used in the widely cited Strubell et al. and Patterson et al. papers.

How it works

The calculation chains four multipliers:

power_kW   = chip_TDP_watts × chips / 1000
energy_kWh = power_kW × hours × PUE
emissions  = energy_kWh × grid_gCO2e_per_kWh   (grams)
tonnes     = emissions / 1,000,000

PUE scales the IT load up to the whole facility (cooling, power conversion), and the grid intensity converts energy into carbon. Default accelerator power draws use published board TDPs: A100 ≈ 400 W, H100 ≈ 700 W, V100 ≈ 300 W, and a TPU v4 chip ≈ 200 W.

Understanding each input

Chip count is the total number of accelerators running in parallel. A run on 64 A100s for 600 hours has the same chip-hours as a run on 128 A100s for 300 hours — and the same energy consumption before PUE.

PUE (Power Usage Effectiveness) is the ratio of the data centre’s total energy draw to the energy consumed by the IT equipment (the chips). A perfect PUE is 1.0 (no overhead), but real data centres always have cooling, power conditioning, and lighting loads. The global average is around 1.55. Modern hyperscale facilities run at 1.1 to 1.2; a typical co-location facility is closer to 1.4 to 1.6. If your cloud provider publishes a PUE for the region you used, enter that value.

Grid carbon intensity (g CO2e/kWh) is the most variable input and often the most impactful one. This measures how much carbon is emitted per unit of electricity generated in your grid region. Values vary enormously: grids powered mainly by coal can exceed 700 g/kWh, the global average is around 480 g/kWh, and grids with large shares of hydro or nuclear power (France, Norway, Quebec, Sweden) can be under 60 g/kWh. Tools such as Electricity Maps provide near-real-time values by region and time of day.

Worked example

For illustration: a fine-tuning run using 32 H100 GPUs for 72 hours, run in a data centre with PUE 1.2, on a grid with an intensity of 350 g/kWh.

power_kW   = 700 W × 32 / 1000 = 22.4 kW
energy_kWh = 22.4 × 72 × 1.2  = 1,935 kWh
emissions  = 1,935 × 350       = 677,250 g CO2e
tonnes     = 677,250 / 1,000,000 ≈ 0.68 tonnes CO2e

For reference, the published estimate for training the original BERT model is approximately 0.65 tonnes CO2e, and training GPT-3 is estimated at roughly 552 tonnes CO2e. Moving the same illustrative run to a low-carbon grid at 50 g/kWh would reduce the emissions from 0.68 tonnes to under 0.10 tonnes — a factor of seven from the same hardware and runtime, purely from grid choice.

What this estimate does not capture

This is a useful planning estimate, not a certified reporting figure. It does not include:

Chip utilisation: TDP is peak power; actual draw depends on workload. Real utilisation is often 60–80% of TDP for well-loaded training runs.
Memory and networking overhead: GPU HBM, NVLink, and InfiniBand interconnects add load not captured in the chip TDP.
Idle time: Startup, checkpointing, and evaluation pauses reduce average power.
Embodied hardware emissions: Manufacturing A100s and H100s at TSMC and assembling servers carries its own carbon cost not included here.
Cooling method: Some facilities use water cooling which affects PUE and carbon calculations differently.

For formal carbon reporting (Scope 2 emissions under GHG Protocol), obtain actual energy invoices from your provider and apply your region’s verified grid-emission factor rather than using this estimate.