Why does each turn cost more than the last?

LLMs are stateless, so every turn re-sends the entire conversation history as input. A 6-turn conversation sends turn 1 six times, turn 2 five times, and so on — input grows roughly quadratically, which this calculator approximates.

What is the escalation rate for?

Not every ticket can be resolved by AI. The escalation rate is the percentage of tickets that still get handed to a human, so the blended cost is: AI cost for all tickets + human cost for the escalated fraction.

Are the human cost numbers realistic?

The human comparison uses your hourly rate times an average handle time. Real fully-loaded agent cost includes benefits, tooling, and management overhead, so treat the human figure as a conservative floor.

Does the AI replace agents entirely?

Usually no. The realistic model is deflection — AI resolves the simple majority and escalates the rest. Tune the escalation rate to your deflection target to see the blended cost.

What is the AI Customer Support Cost Calculator?

Calculate the true cost of AI customer support. Model tickets per day, conversation turns, context retrieval tokens, escalation rate, and model choice to get cost per ticket and monthly spend, benchmarked against human agent cost. It runs free in your browser on Gera Tools, with nothing uploaded.

AI Customer Support Cost Calculator

Name: AI Customer Support Cost Calculator
Creator: Gera Tools
License: https://creativecommons.org/licenses/by/4.0/

Get one useful tool a week

Like this tool? Enter your email and we'll send you one genuinely useful Gera tool a week — plus a link to come back to this one. No spam, one-click unsubscribe any time.

What does AI customer support actually cost?

“Pennies per conversation” is the pitch, but a real support deployment re-sends conversation history every turn, injects retrieved knowledge-base context, and still escalates a fraction of tickets to humans. This calculator models all three so you get a credible cost per ticket and monthly spend, side by side with the human-agent equivalent.

How the cost is built up

Because LLMs are stateless, every turn re-sends the whole conversation so far. For a conversation of n turns where each turn adds roughly the same number of tokens, total input tokens scale with the triangular number of turns:

turns_input = tokens_per_turn × (n × (n + 1) / 2)
ai_cost_per_ticket = (turns_input/1e6 × in_price)
                   + (output_tokens/1e6 × out_price)
blended = ai_cost_per_ticket
        + escalation_rate × human_cost_per_ticket

That quadratic growth is why long conversations get expensive fast — and why trimming history or summarising it matters.

Worked example

Imagine a support bot handling 500 tickets a day. A typical conversation runs 5 turns, each adding 300 tokens of new content (including retrieved context). The cumulative input across all turns is:

300 × (5 × 6 / 2) = 300 × 15 = 4,500 input tokens per ticket

At a mid-tier model priced at $1.00 per million input tokens and $3.00 per million output tokens with 150 tokens of output per turn:

input cost  = 4,500 / 1,000,000 × $1.00 = $0.0045
output cost = 750   / 1,000,000 × $3.00 = $0.00225
ai cost per ticket ≈ $0.0068

With a 20% escalation rate and a human agent handling time costing $4.50 per ticket:

blended = $0.0068 + 0.20 × $4.50 = $0.0068 + $0.90 ≈ $0.91 per ticket
monthly (500/day × 30 days) = 15,000 tickets × $0.91 ≈ $13,600

Compare that to zero escalation with humans alone: 15,000 × $4.50 = $67,500. This is the model the calculator is running.

What affects the result most

Conversation length. An 8-turn ticket costs more than twice as much as a 4-turn ticket because of how history compounds.
Context tokens per turn. Knowledge-base retrieval that injects a 500-token excerpt every turn adds up faster than the actual reply.
Escalation rate. Even a 30% escalation brings the blended cost close to the human figure on expensive agents.
Model choice. A 10× price difference between a frontier and a lightweight model dominates every other factor on high-volume queues.

Tips to lower cost per ticket

Cap conversation length by summarising old turns rather than re-sending them verbatim. Route trivial intents — password resets, order status — to a cheap model (GPT-4o mini, Claude Haiku, Gemini Flash) and reserve a frontier model for complex complaints. Compress retrieved context with chunking and re-ranking so only the most relevant passage lands in each turn. A well-tuned retrieval pipeline can cut tokens per turn by 40% with no quality loss. Most of the bill hides in re-sent context, not in the final answer.