Token counting, demystified
The token count you are billed for is almost never just your prompt text. Chat APIs add structural overhead, function schemas, tool results, and image tokens on top. This reference sheet collects the counting rules per provider so you can estimate the real billed total before you send a request.
How chat-format token counting works
A chat request is a list of messages, each wrapped in structural tokens:
total = sum(per_message_overhead + content_tokens)
+ tool_schema_tokens
+ image_tokens
+ reply_priming
The per-message overhead (role marker plus delimiters) is a handful of tokens each; the reply priming is a small fixed amount added once. The big, easily forgotten contributors are tool/function schemas (counted every request) and images (counted by resolution), both billed as input tokens.
Tips for accurate estimates
- Count tools every time. Every tool definition is re-sent and re-billed on each request — even ones the model never calls. Remove unused tools.
- Right-size image detail. Low-detail images cost a small fixed amount; high-detail images scale with resolution. Downscale before sending.
- Use the real tokenizer for billing. Approximations are fine for planning; use tiktoken or the provider count-tokens endpoint when accuracy matters.