What you are building
This tutorial builds a complete retrieval-augmented generation (RAG) pipeline on
top of a database you probably already run: PostgreSQL. With the pgvector
extension, Postgres stores embedding vectors and answers nearest-neighbour
queries in SQL, so you get semantic retrieval without standing up a separate
vector service. The payoff is one fewer moving part, transactional consistency
between your documents and their embeddings, and the ability to join vector
search against your normal relational tables.
How the pipeline works
There are two phases. Ingestion runs once per document: split it into
overlapping chunks, embed each chunk with a model, and INSERT the text plus its
vector(d) value into a table. You then build an HNSW index on the vector
column so similarity search stays fast as rows accumulate. Querying runs per
request: embed the user’s question, run ORDER BY embedding <=> $1 LIMIT k to
pull the k closest chunks, optionally add a WHERE clause for metadata
filtering, assemble the results into a context block, and prompt the model to
answer strictly from that context. The cosine operator <=> pairs with a
vector_cosine_ops index for normalised embeddings — getting that pairing right
is what keeps the index in play.
Tips and the planner below
Match your vector(d) dimension to the model exactly or inserts fail. Keep a
small chunk overlap so answers that straddle a boundary survive in at least one
chunk, and store the source document id and page in metadata columns so you can
cite and filter. Always instruct the model to say it could not find the answer
when retrieval comes back empty — the strongest guard against hallucination. The
planner below sizes your table, picks a sensible index, and estimates storage and
query cost from your own document profile.