Why is my token count higher than the text alone?

Chat APIs wrap every message in structural tokens (role markers, message delimiters) and add a few priming tokens for the reply. Function definitions, tool results, and images all add tokens on top of your visible text, so the billed total is always higher than counting your prompt text alone.

How many tokens does an image cost?

It depends on resolution and detail setting. OpenAI bills images by tiling — a base cost plus per-512px-tile cost at high detail, or a fixed low cost at low detail. Anthropic approximates image tokens as roughly (width x height) / 750. Both treat images as input tokens.

Do function/tool definitions count as tokens?

Yes. The JSON schema for every tool you pass is serialized and counted as input tokens on each request, even if the model does not call the tool. Trimming unused tools and shortening descriptions directly cuts cost.

Are these rules exact?

They are documented approximations that change with model versions. For billing-grade accuracy use the provider's own tokenizer (tiktoken for OpenAI) or the count-tokens endpoint where available.

Does this sheet send anything to a server?

No. It is a static reference rendered in your browser. Selecting a model just filters which rules are shown.

What is the Token Counting Formula Reference Sheet?

Interactive cheat sheet covering token counting rules for chat format overhead, function/tool calls, images, system prompts, and tool results across OpenAI, Anthropic, and Google LLM APIs. Pick a model to see the rules that apply. It runs free in your browser on Gera Tools, with nothing uploaded.

Token Counting Formula Reference Sheet

Name: Token Counting Formula Reference Sheet
Creator: Gera Tools
License: https://creativecommons.org/licenses/by/4.0/

Token counting, demystified

The token count you are billed for is almost never just your prompt text. Chat APIs add structural overhead, function schemas, tool results, and image tokens on top. This reference sheet collects the counting rules per provider so you can estimate the real billed total before you send a request.

Why the billed count differs from your text count

Most developers count the tokens in their visible prompt text and assume that is what they are billed for. In practice, a chat API request includes several invisible contributors:

Message format overhead: each message in a chat array is wrapped with role markers and delimiters that add a small, fixed number of tokens per message
Tool/function schemas: the JSON schema for every tool you pass is serialized and counted as input tokens on every request, even if the model never calls those tools
Images: billed based on pixel dimensions, not just “one image”
Reply priming: a small fixed amount added to start the model’s reply
Tool results: when you return tool output back to the model, those tokens are counted as input

For a simple single-message request with no tools or images, the overhead is small. For an agentic loop with 10 tool definitions, conversation history, and a high-resolution image, the overhead can easily double the raw text count.

How chat-format token counting works

A chat request is a list of messages, each wrapped in structural tokens:

total = sum(per_message_overhead + content_tokens)
      + tool_schema_tokens
      + image_tokens
      + reply_priming

The per-message overhead (role marker plus delimiters) is a handful of tokens each; the reply priming is a small fixed amount added once.

Provider-specific rules

OpenAI (GPT-4o, o1, o3 family)

Per-message overhead: typically 3–4 tokens per message plus role
Tool schemas: counted as input tokens from the serialized JSON schema of each function definition
Images (high detail): a base cost plus a per-tile cost for each 512×512 pixel tile in the resized image
Images (low detail): a small fixed cost regardless of size
Reply priming: a small fixed token count added to start the assistant turn

Anthropic (Claude family)

Per-message overhead: small fixed amount per message in the messages array
Tool schemas: input_tokens includes the serialized tool definitions passed in the tools parameter
Images: approximately (width × height) / 750 tokens, billed as input tokens
The API’s usage object returns exact input and output token counts per call

Google (Gemini family)

Token counting is available via the countTokens API call before sending the full request
Multimodal inputs (images, video, audio) each have their own token-counting rules documented in the API reference

Tips for accurate estimates

Count tools every time. Every tool definition is re-sent and re-billed on each request — even ones the model never calls. Remove unused tools to save real money.
Right-size image detail. Low-detail images cost a small fixed amount; high-detail images scale with resolution. Downscale before sending if fine detail is not needed.
Use the real tokenizer for billing. Approximations are fine for planning; use tiktoken (OpenAI) or the provider count-tokens endpoint when accuracy matters.
Log the usage object. All three providers return token counts in the API response — log them per call and aggregate in your monitoring to catch unexpected overhead.