Which distance operator should I use in pgvector?

It depends on how your embeddings are normalised. Use the cosine-distance operator with normalised embeddings (the common case for OpenAI and most sentence models), for negative inner product, and for Euclidean. Match the operator to the index opclass — vector_cosine_ops for cosine — or the index will not be used.

Do I need an HNSW index right away?

For a few thousand rows a sequential scan is fine and gives exact results. Once you reach tens of thousands of vectors, add an HNSW index (or IVFFlat) so queries do approximate nearest-neighbour search instead of scanning the whole table. HNSW gives the best recall-latency tradeoff for most apps.

How big can a vector column be?

pgvector supports up to 2000 dimensions for indexed vectors by default. Models like text-embedding-3-small (1536) and 3-large (3072) fit, though 3072-dim vectors need the halfvec type or dimensionality reduction to index. Pick a model whose dimension fits your index plan before you ingest millions of rows.

Can I filter by metadata and similarity at once?

Yes — that is pgvector's strength. Add a WHERE clause on indexed metadata columns alongside the ORDER BY similarity, and Postgres combines them. For heavy filtering create btree indexes on those columns so the planner can prune before ranking, keeping multi-tenant queries fast and safe.

Why choose pgvector over a dedicated vector database?

If you already run PostgreSQL, pgvector means zero new infrastructure, transactional writes, and SQL joins between your vectors and your existing tables. It comfortably handles up to a few million vectors. Move to a dedicated store only when you outgrow that scale or need managed sharding and very low p99 latency.

How to Build RAG with pgvector and PostgreSQL

What you are building

This tutorial builds a complete retrieval-augmented generation (RAG) pipeline on top of a database you probably already run: PostgreSQL. With the pgvector extension, Postgres stores embedding vectors and answers nearest-neighbour queries in SQL, so you get semantic retrieval without standing up a separate vector service. The payoff is one fewer moving part, transactional consistency between your documents and their embeddings, and the ability to join vector search against your normal relational tables.

How the pipeline works

There are two phases. Ingestion runs once per document: split it into overlapping chunks, embed each chunk with a model, and INSERT the text plus its vector(d) value into a table. You then build an HNSW index on the vector column so similarity search stays fast as rows accumulate. Querying runs per request: embed the user’s question, run ORDER BY embedding <=> $1 LIMIT k to pull the k closest chunks, optionally add a WHERE clause for metadata filtering, assemble the results into a context block, and prompt the model to answer strictly from that context. The cosine operator <=> pairs with a vector_cosine_ops index for normalised embeddings — getting that pairing right is what keeps the index in play.

Tips and the planner below

Match your vector(d) dimension to the model exactly or inserts fail. Keep a small chunk overlap so answers that straddle a boundary survive in at least one chunk, and store the source document id and page in metadata columns so you can cite and filter. Always instruct the model to say it could not find the answer when retrieval comes back empty — the strongest guard against hallucination. The planner below sizes your table, picks a sensible index, and estimates storage and query cost from your own document profile.