GitHub repo token estimator
Before you pipe a whole codebase into a long-context model, it helps to know roughly how many tokens it is — both to decide whether it fits and to estimate cost. The GitHub repo token estimator reads a public repository’s file tree (not its contents), sums file sizes by type, and converts bytes to an estimated token count so you can plan chunking and budget up front.
How it works
You paste a repo URL. The tool calls GitHub’s public REST API for the repository’s default branch and requests the recursive git tree, which lists every file with its byte size. It groups files by extension, optionally filters to the extensions you care about, and converts total bytes to tokens using a code-aware ratio of about 3.5 characters per token. It then compares the total against a context window you select and gives a fit-or-chunk verdict. Everything runs in your browser against the public API — no token contents are downloaded.
Tips and notes
- Filter aggressively. Excluding lockfiles,
dist/, and binary assets often cuts the estimate by half or more. - Estimates trend high for minified code and low for verbose comments — treat the number as a planning figure, not a bill.
- Mind the rate limit. Unauthenticated GitHub API calls are limited per hour per IP; space out large repos.
- Chunk by directory. If the repo overflows your window, the per-extension breakdown shows where the bulk lives so you can split sensibly.