Question 1

What is an embedding?

Accepted Answer

An embedding is a list of numbers (a vector) that represents the meaning of a piece of data such as a word, sentence, or image. Items with similar meaning produce vectors that are close together in the vector space.

Question 2

How are embeddings used in search?

Accepted Answer

You embed your query and every document, then find the documents whose vectors are closest to the query vector. This finds results by meaning rather than exact keywords, so 'car' can match 'automobile'.

Question 3

How is similarity between embeddings measured?

Accepted Answer

The most common measure is cosine similarity, which compares the angle between two vectors and returns a score from -1 to 1. Higher scores mean the two items are more semantically related.

Question 4

How big are typical embedding vectors?

Accepted Answer

Common text embedding models output vectors with a few hundred to a few thousand dimensions — for example 384, 768, 1,536, or 3,072 numbers. More dimensions can capture more nuance but cost more to store and compare.

Embedding (AI Glossary)

Definition

What can be embedded

How similarity is measured

Dimensions and models

Why embeddings matter