The default documentation search box matches keywords, so it fails exactly when users need it most — when they describe their problem in words the docs do not use. An AI search engine matches on meaning: it embeds every page, finds the passages most relevant to a question regardless of wording, and answers in natural language with citations back to the source. This tutorial walks the full pipeline — chunk, embed, store, query, cite, refresh — and the generator below produces a starter config for your stack.
Step 1 — Chunk and embed your docs
You cannot embed a whole page usefully; you embed chunks. Split each page into passages of a few hundred tokens, ideally along heading boundaries so each chunk is a coherent section, with a small overlap so a passage that spans a boundary is not lost.
For each chunk, call an embedding model to get a vector, and store the vector
together with the chunk text and its metadata — source page, heading, URL — in a
database. For most docs sites, Postgres with the pgvector extension is more
than enough; you do not need a dedicated vector database until you are at large
scale.
Step 2 — Build the query path
At query time you mirror the indexing step. Embed the user’s question with the same model, run a vector similarity search to retrieve the top-k most relevant chunks, and pass those chunks to an LLM with a tight instruction:
Answer the question using ONLY the passages below.
If they do not contain the answer, say you don't know.
Cite the source heading for each fact you use.
Passages:
{retrieved_chunks}
Question: {user_question}
This retrieval-augmented pattern is what keeps answers grounded in your actual docs rather than the model’s general training.
Step 3 — Cite sources and keep the index fresh
Always return the source page and heading for every passage you used, rendered as clickable links, so a user can verify the answer and read more. Grounding the model and showing its sources is what turns a chatbot into a trustworthy docs search.
Finally, keep the index in lockstep with your published docs. Track a content hash per page, and on each deploy re-embed only the pages that changed and delete chunks for pages that were removed. Use the generator below to scaffold the config for your embedding model, store, and chunk size, then build out the semantic search query path and review how embeddings work.