Why not just use keyword search for docs?

Keyword search fails when users describe a problem in different words than the docs use — searching "app keeps logging me out" misses a page titled "session expiry configuration." Semantic search matches on meaning via embeddings, so it finds the right page even with no shared keywords. That is the single biggest win of an AI docs search over the old box.

Do I need a dedicated vector database?

Not at first. For a few thousand chunks, the pgvector extension on Postgres or even an in-memory index is plenty fast and far simpler to run. Reach for a dedicated vector database only when you have hundreds of thousands of chunks or need very low latency at scale. Start simple and migrate if you actually hit a limit.

How big should each chunk be?

A few hundred tokens — roughly a section or two — with a small overlap between chunks so a passage that straddles a boundary is not lost. Too small and a chunk lacks context; too large and the retrieved passage is mostly irrelevant text that dilutes the answer. Chunking by heading is a good, structure-aware default for documentation.

How do I stop the AI from answering when the docs do not cover it?

Instruct the model to answer only from the retrieved passages and to say it does not know if they do not contain the answer, then show the cited sources so users can check. This grounding plus citation pattern is what keeps a docs search honest. Never let it answer from general knowledge, or it will confidently describe features you do not have.

How do I keep the search index up to date?

Re-embed pages that changed as part of your deploy pipeline. Track a content hash per page, and on each build re-embed only the pages whose hash changed, deleting chunks for removed pages. This keeps the index in lockstep with the published docs without re-embedding everything every time, which would be slow and costly.

How to Build an AI Search Engine for Your Documentation

The default documentation search box matches keywords, so it fails exactly when users need it most — when they describe their problem in words the docs do not use. An AI search engine matches on meaning: it embeds every page, finds the passages most relevant to a question regardless of wording, and answers in natural language with citations back to the source. This tutorial walks the full pipeline — chunk, embed, store, query, cite, refresh — and the generator below produces a starter config for your stack.

Step 1 — Chunk and embed your docs

You cannot embed a whole page usefully; you embed chunks. Split each page into passages of a few hundred tokens, ideally along heading boundaries so each chunk is a coherent section, with a small overlap so a passage that spans a boundary is not lost.

For each chunk, call an embedding model to get a vector, and store the vector together with the chunk text and its metadata — source page, heading, URL — in a database. For most docs sites, Postgres with the pgvector extension is more than enough; you do not need a dedicated vector database until you are at large scale.

Step 2 — Build the query path

At query time you mirror the indexing step. Embed the user’s question with the same model, run a vector similarity search to retrieve the top-k most relevant chunks, and pass those chunks to an LLM with a tight instruction:

Answer the question using ONLY the passages below.
If they do not contain the answer, say you don't know.
Cite the source heading for each fact you use.

Passages:
{retrieved_chunks}

Question: {user_question}

This retrieval-augmented pattern is what keeps answers grounded in your actual docs rather than the model’s general training.

Step 3 — Cite sources and keep the index fresh

Always return the source page and heading for every passage you used, rendered as clickable links, so a user can verify the answer and read more. Grounding the model and showing its sources is what turns a chatbot into a trustworthy docs search.

Finally, keep the index in lockstep with your published docs. Track a content hash per page, and on each deploy re-embed only the pages that changed and delete chunks for pages that were removed. Use the generator below to scaffold the config for your embedding model, store, and chunk size, then build out the semantic search query path and review how embeddings work.