Embedding cost calculator
Before you embed a knowledge base for RAG or semantic search, it helps to know what it will cost. Embedding models are cheap per token, but a large corpus adds up — and re-embedding after a model upgrade multiplies it. This tool estimates the token count of your corpus and the total API cost across popular embedding models from OpenAI, Cohere, and Voyage.
How it works
You give the tool a corpus size in one of three ways: paste sample text (it estimates tokens at roughly four characters per token), enter a token count directly, or enter a document count with an average tokens-per-document figure. It then multiplies the total tokens by the selected model’s price per million tokens to produce the embedding cost. It also reports how many vectors you will generate, assuming one vector per document, so you can plan vector-database storage alongside the API spend.
Tips and notes
- Character-to-token ratio varies. The ~4 chars/token rule holds for English prose; code, non-Latin scripts, and dense JSON tokenize differently, so paste a real tokenizer count when precision matters.
- Budget for re-embedding. Switching models or chunking strategies means paying the full corpus cost again — factor that into model selection.
- Storage is the other cost. Higher-dimensional vectors cost more to store and search; weigh dimension against retrieval quality.
- Local only. Your text never leaves the browser.