Embedding Model Comparison Table

Compare text embedding models by dimensions, cost, max tokens, and MTEB score.

Ad placeholder (leaderboard)

Compare embedding models at a glance

Choosing a text embedding model means balancing retrieval quality, cost, and vector size. This table puts the major options side by side — OpenAI, Cohere, Voyage, and leading open-source models — with their dimensions, maximum input length, price per million tokens, and MTEB benchmark score, so you can pick the right model for your RAG or search index.

How to read the table

  • Dimensions is the length of each output vector. Larger vectors cost more to store and search but can encode more meaning.
  • Max tokens is the longest input the model accepts in a single call; longer documents must be chunked.
  • Cost / 1M tokens is the embedding price — embeddings are far cheaper than generation, but at billions of tokens it adds up.
  • MTEB is the average benchmark score. Treat differences of a point or two as noise; large gaps are meaningful.

Filter by provider, search by model name, and click a column header to sort.

Tips for picking a model

  • For most production RAG, a mid-tier model like text-embedding-3-small or voyage-3-lite hits the best quality-per-dollar.
  • If storage dominates your bill, prefer a model that supports dimension truncation and store 256–512 dims.
  • Always re-embed your whole corpus when you change models — vectors from different models are not comparable.
Ad placeholder (rectangle)