AI Image Inference Speed Benchmarks

Compare generation speed across GPUs and cloud providers for SD models

Ad placeholder (leaderboard)

AI image inference speed benchmarks

Generation speed for Stable Diffusion and Flux.1 depends on three things: the model (UNet size and native resolution), the GPU, and the output resolution. This reference estimates images per minute so you can size hardware, compare cloud GPUs, or decide between SD 1.5 and SDXL for a batch job.

How it works

Each diffusion step is one forward pass through the UNet, so total time is roughly:

time_per_image ≈ steps × time_per_step(model, gpu, resolution)
images_per_min ≈ 60 / time_per_image

The benchmark table is calibrated at a baseline step count (25–30 steps). When you change the step slider, the estimate scales linearly because the number of UNet passes scales linearly. Resolution scales roughly with pixel count — moving SDXL from 1024px to 1536px more than doubles the per-image cost.

Notes and caveats

  • Enable xFormers or SDPA. Both cut memory and add 20–40% speed on most GPUs.
  • Batch for throughput, not latency. A100/H100 win when you generate many images at once; for one image a high-clock 4090 is often faster.
  • VRAM gates the model, not the speed. A 3060 (12GB) can run SDXL but offloads to system RAM, which is far slower than a 24GB card holding everything resident.
  • Samplers matter. DPM++ 2M can hit good quality in ~20 steps; older Euler-a may need 30+, directly changing your images-per-minute.
Ad placeholder (rectangle)