Assistants API thread cost calculator
The OpenAI Assistants API is convenient because it manages conversation state for you — but that convenience hides a sharp cost curve. Every run re-sends the entire thread as input, so a long-lived thread pays to re-process its own history again and again. This calculator shows the cumulative cost as a thread grows and where a truncation cap pays for itself.
How the cost grows
If each turn adds t tokens, then by turn n the input sent is roughly
t × n. Summed across all turns the total input processed is proportional to
n² — quadratic growth. That is why a 50-turn support thread can cost far more
than fifty independent calls.
turn k input tokens ≈ avg_tokens_per_turn × k
total input ≈ avg_tokens_per_turn × (1 + 2 + ... + n)
= avg_tokens_per_turn × n(n+1)/2
A truncation cap flattens the curve: once the running context hits the cap, each further turn re-sends only the cap, turning quadratic growth back into linear growth.
Tips and notes
- Set a truncation strategy early. The Assistants API supports
truncation_strategy; use it before threads get long, not after a surprise bill. - Summarize old turns. Replacing stale early turns with a short summary keeps the thread small without losing the thread’s intent.
- Start fresh threads. For unrelated questions, a new thread is almost always cheaper than appending to a giant one.