MTEB is the Massive Text Embedding Benchmark, a public leaderboard scoring embedding models across dozens of retrieval, clustering, and classification tasks. A higher average MTEB score generally indicates better embedding quality.

Do more dimensions always mean better quality?

No. More dimensions can capture more nuance but cost more to store and search, and some lower-dimension models outperform larger ones on MTEB. Match dimensions to your storage budget and recall needs.

Are these prices current?

Prices are list-price estimates and clearly labelled. Providers change pricing frequently, so confirm the current rate in your provider's dashboard before budgeting.

Can I shorten OpenAI embedding dimensions?

Yes. text-embedding-3 models support Matryoshka dimension truncation, so you can request fewer dimensions (e.g. 256 or 512) to cut storage cost with only a small quality loss.

What is the Embedding Model Comparison Table?

Searchable, sortable comparison of major text embedding models — dimensions, max input tokens, cost per 1M tokens, and MTEB benchmark scores for OpenAI, Cohere, Voyage, and open-source options. It runs free in your browser on Gera Tools, with nothing uploaded.

Embedding Model Comparison Table

Name: Embedding Model Comparison Table
Creator: Gera Tools
License: https://creativecommons.org/licenses/by/4.0/

Compare embedding models at a glance

Choosing a text embedding model means balancing retrieval quality, cost, and vector size. This table puts the major options side by side — OpenAI, Cohere, Voyage, and leading open-source models — with their dimensions, maximum input length, price per million tokens, and MTEB benchmark score, so you can pick the right model for your RAG or search index.

How to read the table

Dimensions is the length of each output vector. Larger vectors cost more to store and search but can encode more meaning.
Max tokens is the longest input the model accepts in a single call; longer documents must be chunked.
Cost / 1M tokens is the embedding price — embeddings are far cheaper than generation, but at billions of tokens it adds up.
MTEB is the average benchmark score. Treat differences of a point or two as noise; large gaps are meaningful.

Filter by provider, search by model name, and click a column header to sort.

What MTEB measures — and its limits

MTEB (Massive Text Embedding Benchmark) scores a model across a wide range of tasks including semantic textual similarity, retrieval, clustering, classification, and pair scoring. A higher average MTEB score generally means better general-purpose embedding quality.

However, MTEB is a general benchmark. A model with a slightly lower MTEB score can outperform a higher-ranked model on a specific domain if that domain is well-represented in its training data. For production use cases on specialized content (legal, medical, code, or a narrow product catalog), always validate on a sample of your actual data rather than relying solely on the leaderboard position.

Key dimensions to compare

Dimensions and storage

A model with 1,536 dimensions produces vectors that are half the storage footprint of a 3,072-dimension model. At scale, this matters: ten million 3,072-dimension float32 vectors require roughly 120 GB of raw storage before indexing overhead. The same corpus at 1,536 dimensions needs about 60 GB.

Some models (notably OpenAI text-embedding-3 series) support Matryoshka dimension truncation: you can request 256, 512, 1024, or 1536 dimensions from the same model. Shorter vectors cost less to store and search with only a modest quality loss at 512 and below, making this a powerful cost lever for large corpora.

Maximum input tokens

Models with longer context windows (8,000+ tokens) can embed a full page of text as a single vector without chunking. This simplifies retrieval pipelines but produces coarser representations because long vectors average over more content. Models with shorter context windows (512 tokens) force finer chunking, which often yields more precise retrieval at the cost of a larger vector count.

Self-hosted vs. API

Self-hosted models (common in the open-source category) have zero per-token API cost. The trade-off is that you pay for compute infrastructure, take on model versioning and deployment, and typically get lower raw quality than the best commercial models. For privacy-sensitive workloads where data cannot leave your infrastructure, self-hosting is often the right call regardless of benchmark differences.

Tips for picking a model

For most production RAG, a mid-tier model hits the best quality-per-dollar.
If storage dominates your bill, prefer a model that supports dimension truncation and store 256–512 dims.
Always re-embed your whole corpus when you change models — vectors from different models are not comparable.
Run a retrieval quality check on 50–100 representative queries against your real documents before committing to a model for production; leaderboard rank is a guide, not a guarantee.