OpenAI Assistants API Thread Cost Calculator

Calculate the true cost of long-running Assistants API threads

Ad placeholder (leaderboard)

Assistants API thread cost calculator

The OpenAI Assistants API is convenient because it manages conversation state for you — but that convenience hides a sharp cost curve. Every run re-sends the entire thread as input, so a long-lived thread pays to re-process its own history again and again. This calculator shows the cumulative cost as a thread grows and where a truncation cap pays for itself.

How the cost grows

If each turn adds t tokens, then by turn n the input sent is roughly t × n. Summed across all turns the total input processed is proportional to — quadratic growth. That is why a 50-turn support thread can cost far more than fifty independent calls.

turn k input tokens ≈ avg_tokens_per_turn × k
total input         ≈ avg_tokens_per_turn × (1 + 2 + ... + n)
                    = avg_tokens_per_turn × n(n+1)/2

A truncation cap flattens the curve: once the running context hits the cap, each further turn re-sends only the cap, turning quadratic growth back into linear growth.

Tips and notes

  • Set a truncation strategy early. The Assistants API supports truncation_strategy; use it before threads get long, not after a surprise bill.
  • Summarize old turns. Replacing stale early turns with a short summary keeps the thread small without losing the thread’s intent.
  • Start fresh threads. For unrelated questions, a new thread is almost always cheaper than appending to a giant one.
Ad placeholder (rectangle)