Embedding model comparison table
Choosing a text embedding model means balancing quality, vector size, context limit, and cost. This filterable table puts the popular options — from OpenAI, Cohere, Voyage AI, Jina, and open-source providers — side by side so you can build a shortlist by dimensions, max tokens, price, and benchmark score, then validate the finalists on your own data.
How it works
Each row lists a model with its native embedding dimensions, maximum input tokens, approximate price per million tokens, a representative MTEB-style benchmark score, and notes on multilingual support and dimension flexibility. You pick a task type and optionally filter by cost or dimension size; the table re-ranks and highlights the models best suited to your use. All filtering runs locally in your browser — no data is sent anywhere.
Notes and caveats
- Benchmark scores guide, not decide. A high MTEB average suggests general quality, but domain-specific performance can differ — always test on your data.
- Watch the dimension cost. Larger vectors cost more to store and search; Matryoshka-capable models let you shorten vectors with minimal quality loss.
- Mind the context limit. For RAG with large chunks, max input tokens can matter more than a small benchmark difference.
- Verify pricing. Providers change models and prices often; confirm current figures on the official docs before you commit.