CSV / Tabular Data Token Estimator

Estimate tokens when sending CSV or table data in an LLM prompt

Ad placeholder (leaderboard)

Send tabular data to an LLM for the fewest tokens

The same table can cost wildly different amounts depending on how you serialise it into a prompt. This estimator takes your CSV and shows the token count for four common representations — raw CSV, JSON, markdown table, and key-value — so you can pick the cheapest one your model handles well.

How the formats compare

Token cost is driven by repeated structural characters. CSV states each column header once and separates values with a single comma. JSON repeats every key on every row:

csv:      name,age\nAda,36
json:     [{"name":"Ada","age":36}]
markdown: | name | age |\n| --- | --- |\n| Ada | 36 |

For wide or tall tables, JSON’s repeated keys can multiply the token count several times over versus CSV for identical data. The estimator approximates tokens at roughly four characters per token, the standard rule of thumb for English-like text.

Tips to cut tabular token cost

  • Prefer raw CSV for pure data transfer — it is consistently the cheapest.
  • Send only the columns and rows you need. Dropping unused columns cuts the per-row cost across the whole table.
  • Abbreviate headers when the model does not need full names; a header that appears once is cheap in CSV but expensive when repeated in JSON.
Ad placeholder (rectangle)