PDF-to-tokens estimator
Long documents are where LLM bills explode. Before you push a hundred-page PDF into a model, this tool tells you how many tokens it is, what that costs, and whether it even fits the context window — so you can choose intelligently between sending it whole, chunking it, or summarizing first.
How it works
You can estimate two ways. Paste the extracted text and the tool tokenizes it directly with a character-based heuristic. Or, for a quick sizing, enter the page count and words-per-page, and it estimates tokens from typical document density (about 1.33 tokens per English word). It then prices the input against your chosen model and shows the percentage of that model’s context window the document fills. Everything runs locally; nothing is uploaded.
Tips and notes
If a document fits comfortably in the context window and you only need to ask it one or two questions, sending it directly is simple and fine. If you will query it many times, the per-call cost of resending everything dominates — chunk it and retrieve only the relevant passages, or summarize once and reuse the summary. Scanned PDFs have no extractable text and need OCR first, which this estimate does not cover. Treat the token figure as a close estimate and confirm against the provider’s tokenizer for large or budget-critical runs.