Why can you do arithmetic on embeddings?

Because embeddings place meaning in a geometric space, consistent relationships become consistent directions. The gap from man to woman points roughly the same way as king to queen, so subtracting and adding those vectors moves you along a meaningful axis like gender, producing analogies as arithmetic.

What is cosine similarity used for?

Cosine similarity measures the angle between two embedding vectors, scoring how related they are regardless of magnitude. It is the standard ranking metric for semantic search and recommendations: embed a query, then return the items whose vectors have the highest cosine similarity to it.

How do embeddings power semantic search?

Instead of matching keywords, semantic search embeds every document once and embeds the query at search time, then returns the documents whose vectors are closest. This finds results by meaning, so a search for "car" can surface a page about "automobiles" even with no shared words.

How do embeddings relate to RAG?

Retrieval-augmented generation uses embeddings to fetch relevant context for a language model. Your documents are embedded and stored; at query time the question is embedded and the nearest chunks are retrieved and pasted into the prompt, grounding the model's answer in your own data.

Vector Embeddings Explained: Meaning as Math

From points to meaning

A vector embedding represents an item as a point in a high-dimensional space where distance and direction encode meaning. The reason this matters is that the geometry is not arbitrary — relationships between concepts show up as consistent directions you can manipulate with arithmetic. The most famous demonstration is that king − man + woman lands near queen. The analogy solver below lets you try this kind of arithmetic yourself.

Embedding arithmetic and analogies

When embeddings are trained well, semantic relationships become vector directions. The difference between “man” and “woman” is roughly the same direction as “king” and “queen” — a gender axis. So if you take the vector for “king,” subtract “man,” and add “woman,” you arrive near “queen.” The same trick captures Paris − France + Italy ≈ Rome (capital-of) and verb tenses, plurals, and comparatives. This is strong evidence that the vectors encode real structure rather than noise.

Cosine similarity, the workhorse metric

To compare embeddings, almost everyone uses cosine similarity — the cosine of the angle between two vectors. It ignores length and focuses on direction, scoring 1 for identical orientation down to −1 for opposite. This is the single most important operation in applied embeddings: rank a list of candidates by their cosine similarity to a query vector and you have built the core of search, deduplication, clustering, and recommendation.

Powering search and recommendations

Semantic search embeds every document ahead of time and stores the vectors. At query time it embeds the question and returns the nearest documents by cosine similarity, finding results by meaning rather than exact words. Recommendation systems work the same way: represent users and items as vectors and suggest items whose embeddings sit closest to a user’s. Because the heavy work is precomputed, lookups stay fast even over millions of items.

Embeddings and retrieval-augmented generation

The most common modern use is retrieval-augmented generation (RAG). You split your documents into chunks, embed each chunk, and store them. When a user asks a question, you embed the question, retrieve the most similar chunks, and feed them into a language model’s prompt as context. This grounds the model in your own, up-to-date data and dramatically reduces hallucination — all built on the same similarity arithmetic the analogy solver demonstrates.