Why does an Assistants API thread cost more than a single chat call?

On each run the Assistants API re-sends the entire thread as input. Turn 10 pays to re-process turns 1 through 9 again, so cumulative input cost grows roughly quadratically with thread length.

What does truncation do?

A truncation cap limits how many prior tokens are re-sent. Once the thread exceeds the cap, each turn pays a flat amount instead of an ever-growing one, which bounds the cost.

Does this include output token cost?

This tool focuses on the dominant input-token growth, which is where Assistants API surprises come from. Add your output cost separately with an LLM API cost calculator.

Is my data sent anywhere?

No. The calculator runs entirely in your browser. Nothing you enter is uploaded, stored or logged.

What is the OpenAI Assistants API Thread Cost Calculator?

Free Assistants API thread cost calculator. The Assistants API re-sends the full thread on every turn, so cost grows quadratically. See cumulative spend as a thread lengthens and the truncation point that caps it. It runs free in your browser on Gera Tools, with nothing uploaded.

OpenAI Assistants API Thread Cost Calculator

Name: OpenAI Assistants API Thread Cost Calculator
Creator: Gera Tools
License: https://creativecommons.org/licenses/by/4.0/

Assistants API thread cost calculator

The OpenAI Assistants API is convenient because it manages conversation state for you — but that convenience hides a sharp cost curve. Every run re-sends the entire thread as input, so a long-lived thread pays to re-process its own history again and again. This calculator shows the cumulative cost as a thread grows and where a truncation cap pays for itself.

How the cost grows

If each turn adds t tokens, then by turn n the input sent is roughly t × n. Summed across all turns the total input processed is proportional to n² — quadratic growth. That is why a 50-turn support thread can cost far more than fifty independent calls.

turn k input tokens ≈ avg_tokens_per_turn × k
total input         ≈ avg_tokens_per_turn × (1 + 2 + ... + n)
                    = avg_tokens_per_turn × n(n+1)/2

A truncation cap flattens the curve: once the running context hits the cap, each further turn re-sends only the cap, turning quadratic growth back into linear growth.

Illustrative example

Consider a support bot where each turn adds about 200 tokens (user message plus tool output plus assistant reply). Over 30 turns:

Turn 1 costs for 200 input tokens
Turn 15 costs for about 3,000 input tokens (15 × 200)
Turn 30 costs for about 6,000 input tokens
Total input across 30 turns: about 93,000 tokens (sum 1 to 30 × 200)

Without truncation, a 30-turn thread processed about 460 times more input than a single-turn call. At $0.01 per thousand input tokens, that is roughly $0.93 just in input costs for one thread — which may sound modest until you multiply by thousands of concurrent threads per day.

With a truncation cap of 4,000 tokens, once the thread exceeds that cap each turn pays for only 4,000 tokens of input. For the same 30 turns, total input costs stay much flatter. The calculator shows the exact break-even point where enabling truncation saves more than it costs in lost context.

Assistants API vs Chat Completions API

The Chat Completions API requires your application to manage message history itself — you pass the full conversation array in each request. This is transparent: you control exactly what gets sent and can prune or summarise old messages before the call. The Assistants API abstracts this away, which is convenient but removes that control and hides the growing cost. If cost control is a priority, managing history manually with Chat Completions can be cheaper because you decide the truncation strategy at every turn.

Cost control strategies

Set a truncation strategy early. The Assistants API supports truncation_strategy on each run. Use it from the start of a thread, not after a surprise bill. The last_messages truncation type keeps only the N most recent messages; the auto type lets OpenAI decide. For predictable costs, last_messages with a fixed count is more controllable.

Summarise old turns. A common pattern is to inject a short “conversation summary so far” as a system message when the thread exceeds a threshold, then delete the original detailed turns via the thread messages API. This preserves intent at a fraction of the token cost.

Start fresh threads for new topics. A thread accumulates everything ever said in it. For a support application where each ticket is a separate issue, create a new thread per ticket rather than maintaining one long thread per customer. This limits the maximum thread length to the length of a single conversation.

Monitor costs per thread. The Assistants API run objects return token counts. Log these per run and set an alert if any thread exceeds a token threshold — an unexpectedly long thread often signals a loop or runaway tool call rather than a genuinely long conversation.