What is MTEB and why does it matter?

MTEB is the Massive Text Embedding Benchmark, a public suite that scores embedding models across retrieval, classification, clustering, and more. A higher average is a useful signal of general quality, but it is not a substitute for testing on your own domain data.

Do more dimensions always mean better embeddings?

No. Higher-dimensional vectors can capture more nuance but cost more to store and search and are slower at scale. Many modern models support dimension reduction (Matryoshka) so you can trade a little quality for much cheaper storage and faster search.

How do I read the price column?

Prices are shown per million input tokens, the standard unit providers bill on. To estimate cost, multiply your total tokens to embed by the price and divide by one million. Open-source models show as self-hosted, where you pay infrastructure rather than per token.

Which model is best for retrieval-augmented generation?

For RAG, prioritize retrieval performance and max input tokens so you can embed larger chunks. A high MTEB retrieval score and strong multilingual support matter most; the table lets you filter to strong general models and compare their context limits.

Are these numbers exact and current?

They are representative figures to guide a shortlist, and providers update models and pricing frequently. Always confirm the latest dimensions, limits, and prices on the provider's own documentation before committing.

What is the Embedding Model Comparison Table?

A filterable reference table of popular text embedding models from OpenAI, Cohere, Voyage AI, Jina, and open-source providers, comparing dimensions, max input tokens, price per million tokens, and benchmark scores to pick the right model. It runs free in your browser on Gera Tools, with nothing uploaded.

Embedding Model Comparison Table

Name: Embedding Model Comparison Table
Creator: Gera Tools
License: https://creativecommons.org/licenses/by/4.0/

Embedding model comparison table

Choosing a text embedding model means balancing quality, vector size, context limit, and cost. This filterable table puts the popular options — from OpenAI, Cohere, Voyage AI, Jina, and open-source providers — side by side so you can build a shortlist by dimensions, max tokens, price, and benchmark score, then validate the finalists on your own data.

How it works

Each row lists a model with its native embedding dimensions, maximum input tokens, approximate price per million tokens, a representative MTEB-style benchmark score, and notes on multilingual support and dimension flexibility. You pick a task type and optionally filter by cost or dimension size; the table re-ranks and highlights the models best suited to your use. All filtering runs locally in your browser — no data is sent anywhere.

Why the choice of embedding model matters

An embedding model converts a piece of text — a sentence, a paragraph, a document chunk — into a fixed-length numerical vector that represents meaning. When two pieces of text are semantically similar, their vectors sit close together in the high-dimensional space. The accuracy of this proximity is what makes or breaks a semantic search system, a RAG pipeline, a duplicate-detector, or a classification model.

Different models make different trade-offs that affect your application in tangible ways:

Dimensions. A model that produces 1,536-dimensional vectors stores three times as much data per document as a 512-dimensional model, and similarity search scales super-linearly with dimension. For a million-document index the storage and query-latency difference is real. Matryoshka-trained models let you truncate to a lower dimension at inference time with only a small quality penalty, which can cut infrastructure cost significantly.

Max input tokens. If your documents are long — legal contracts, research papers, lengthy product descriptions — a model capped at 512 tokens forces you to split aggressively, creating chunk boundaries that can break context. A model that handles 8,000 or more tokens can embed a whole section in one pass.

MTEB score and task type. The Massive Text Embedding Benchmark scores models on retrieval, clustering, classification, reranking, and more. A model that excels at retrieval may not be the best choice for clustering similar customer-support tickets. Use the task filter to surface models that rank well on your specific use case.

Practical guidance for common use cases

Retrieval-augmented generation (RAG). For RAG your primary concerns are retrieval accuracy (MTEB retrieval sub-score), context window (larger chunks need higher token limits), and inference latency (since you embed every query at runtime). Cost matters too when you are embedding a large document corpus upfront.

Semantic search over a small corpus. When the index is small (tens of thousands of documents or fewer), quality beats cost. A high-scoring model with more dimensions is affordable because storage is cheap at that scale.

High-volume classification or clustering. Here you are embedding millions of inputs and the per-token cost dominates. A smaller, cheaper model that still scores well on classification tasks — or a self-hosted open-source model — may deliver better economics without a meaningful quality drop.

Multilingual applications. Many models are trained primarily on English. If your users write in French, Arabic, or Swahili, check the multilingual column carefully. A model that scores well on English MTEB may drop sharply on non-English retrieval.

Comparing models: a worked decision example

For example, suppose you are building a customer-support search over 500,000 support tickets, with queries in English and occasional Spanish. Your priorities are retrieval accuracy, moderate cost, and the ability to handle tickets up to 400 words.

You would filter to retrieval task, set max tokens to at least 512, and check the multilingual flag. From the shortlist you would pick the two highest-scoring models and estimate monthly cost: if you embed all 500,000 documents once and then embed perhaps 50,000 queries per month, a model priced at $0.10 per million tokens costs about $5 to embed the corpus and roughly $0.25 per month on queries — very cheap. You could afford a higher-quality, higher-cost model without it mattering much. If you were embedding 50 million documents, the arithmetic changes considerably.

Notes and caveats

Benchmark scores guide, not decide. A high MTEB average suggests general quality, but domain-specific performance can differ — always test on your data.
Watch the dimension cost. Larger vectors cost more to store and search; Matryoshka-capable models let you shorten vectors with minimal quality loss.
Mind the context limit. For RAG with large chunks, max input tokens can matter more than a small benchmark difference.
Verify pricing. Providers change models and prices often; confirm current figures on the official docs before you commit.