What an LLM-powered knowledge base is
A knowledge base powered by a large language model lets your team ask questions in plain English and get answers drawn from your own documents — your Notion wiki, Confluence space, or Google Docs — instead of hunting through pages. The standard architecture is retrieval-augmented generation (RAG): you split documents into chunks, store a vector embedding of each chunk in an index, and at query time retrieve the most relevant chunks and hand them to the model as context. The model answers from your content, with citations, rather than from its training data. The planner below sizes the corpus and estimates the one-time and recurring cost so you can budget before you build.
How it works
There are four stages. Ingest: pull documents from your sources via their APIs, normalise to clean text, and strip boilerplate. Index: split each document into chunks (typically 300–800 tokens with overlap), embed every chunk with an embedding model, and store the vectors plus metadata in a vector index. Query: embed the user’s question, retrieve the top matching chunks, and pass them to the LLM with an instruction to answer only from the provided context. Refresh: re-embed documents when they change — driven by source webhooks for immediacy plus a periodic full sweep — so the base never goes stale. Re-embed only what changed to keep the recurring bill small.
Sizing, cost, and freshness
The two numbers that drive cost are the one-time embedding of your whole corpus and the recurring re-embedding of what changes. The tool above turns your document count, average length, chunk size, and change rate into chunk counts, token volume, an estimated embedding cost, and a storage footprint, then projects the monthly cost of keeping it fresh at your chosen cadence. Two practical notes: embedding is cheap relative to generation, so do not over optimise it — the per-query LLM cost usually dominates at scale. And ground every answer in retrieved chunks with visible citations and an explicit “I don’t know” path, which is what separates a trustworthy knowledge base from a confident guesser. For keeping the running system honest after launch, see how to monitor AI apps in production.