Replicate Model Cost Estimator

Estimate Replicate run costs for popular image and video models

Ad placeholder (leaderboard)

Replicate model cost estimator

Replicate bills by the second of compute, not per image or per token, so two runs of the same model can cost very differently depending on the GPU tier and how long the prediction takes. That makes budgeting tricky — a Flux generation on an H100 and an SDXL run on an A40 land at different prices even when both produce one image. This estimator multiplies the per-second hardware rate by your run time and monthly volume to give a concrete cost.

How it works

Choose a model preset and the tool loads a typical hardware tier and run time; tweak either and the preset flips to Custom. It multiplies the tier’s per-second rate by your run-time seconds to get cost per run, then by your monthly volume for a total. The hardware dropdown shows each tier’s per-second price so you can see exactly where the money goes — moving from an H100 to an A40, for instance, roughly halves the rate.

Tips for accurate estimates

  • Measure real run times. Cold starts add seconds; run a handful of live predictions and use the average rather than the preset.
  • Right-size the GPU. Smaller models often run fine on a T4 or L40S at a fraction of an H100’s rate — don’t pay for hardware the model can’t saturate.
  • Account for failures and retries. Failed runs still consume compute; pad your monthly volume slightly if your inputs are unreliable.
  • Compare to dedicated APIs. For very high volume, a fixed-price endpoint (Stability, OpenAI) can beat per-second pricing — estimate both before committing.
Ad placeholder (rectangle)