What LlamaIndex does
LlamaIndex is a framework focused on the retrieval half of RAG: getting your data into a form an LLM can search, and getting the right pieces out at query time. It handles the unglamorous parts — reading files, splitting them into chunks, embedding them, storing the vectors, and retrieving the best matches — behind a small, consistent API. Where a hand-rolled RAG pipeline is dozens of lines, a basic LlamaIndex query engine is about five.
How it works
The flow has four stages, each with a named LlamaIndex concept. Data
connectors (like SimpleDirectoryReader) load files into Document objects.
A node parser splits those documents into Node chunks with metadata and
relationships to their neighbours. A VectorStoreIndex embeds every node and
stores the vectors — in memory by default, or in an external store like Qdrant
or pgvector via a StorageContext. Finally a query engine (index .as_query_engine()) embeds the user’s question, retrieves the most similar
nodes, and synthesises an answer with citations.
For harder questions that span several sources, the SubQuestionQueryEngine decomposes the question into sub-questions, routes each to the right data source, and merges the answers — far more reliable than stuffing everything into one retrieval. The generator below builds a complete, runnable LlamaIndex query engine from your choices, including persistence and an optional external vector store.
Tips for production use
Persist your index with storage_context.persist() so you embed the corpus once
rather than on every run — this is the biggest time and cost saving. Tune the
node parser’s chunk size and overlap against your real questions before reaching
for a bigger model. Add metadata (source, page, section) to nodes so answers can
cite where they came from. Use the SubQuestionQueryEngine when a question needs
facts from multiple documents. And swap the in-memory store for a real vector
database before you scale past a few thousand nodes — the query code stays
identical, only the storage context changes.