What semantic search is
Semantic search finds documents by meaning rather than exact words. Instead of matching the literal tokens in a query, it embeds both the query and your documents into vectors and returns the documents whose vectors point in the most similar direction. A search for “cheap flights” then surfaces a passage about “low-cost airfare” even though they share no keywords — something traditional keyword search cannot do.
How the pipeline works
There are two phases. Indexing: each document is embedded into a vector and stored, with its text and metadata, in a vector database that builds an approximate nearest-neighbor index. Querying: the user’s natural-language query is embedded with the same model, the store returns the nearest vectors by cosine similarity, and an optional re-ranker reorders the top candidates for sharper results before you display them.
The demo below holds a small sample corpus. Type a query and watch it rank the documents by semantic relevance — notice how it favors meaning over exact word overlap, the core behavior of a real engine.
Tips for production semantic search
Consider hybrid search — blend the semantic score with a keyword signal like BM25 so exact names, codes, and rare terms are not lost. Retrieve more candidates than you show (top 20-50) and re-rank them with a cross-encoder for the best final ordering. Always store source metadata so you can filter by date, type, or permissions. Cache embeddings for popular queries, and re-embed your corpus whenever documents change or you switch embedding models, since vectors across models are not comparable.