How to Use Vector Databases: A Developer's Guide

Pinecone, pgvector, Weaviate, Qdrant — which and when

Ad placeholder (leaderboard)

What a vector database does

A vector database stores high-dimensional embeddings — numeric fingerprints of text, images, or audio produced by a model — and answers the question “which stored items are most similar to this one?” That single capability powers semantic search, retrieval-augmented generation, recommendations, deduplication, and anomaly detection. The core operation is a nearest-neighbour search over vectors, optionally constrained by structured metadata, returned ranked by distance (cosine, dot product, or Euclidean).

The reason you need a specialised store rather than a WHERE clause is that similarity is not equality. You are not looking for an exact match; you are ranking thousands or millions of candidates by closeness in a 384- to 3072-dimensional space, fast enough to serve a request. Doing that well is an indexing problem, which is where these systems earn their keep.

Picking an index: HNSW vs flat (and IVF)

The index is the single biggest performance decision. A flat index compares the query to every vector — exact but linear, fine up to a few hundred thousand vectors. HNSW (Hierarchical Navigable Small World) builds a layered graph that hops toward the nearest neighbours in near-logarithmic time; it is the default for most workloads and trades a tunable sliver of recall for huge speed. IVF (inverted file) clusters vectors and searches only the nearest clusters — cheaper memory than HNSW but more sensitive to tuning. The knobs that matter are HNSW’s m and ef_construction (build quality) and ef_search (query-time recall vs latency). Always measure recall against a labelled set rather than trusting defaults.

Choosing a database

pgvector turns PostgreSQL into a vector store: zero new infrastructure, transactional consistency, and SQL joins against your existing tables — ideal under a few million vectors. Pinecone is fully managed, scales transparently, and is the fastest path to production if you don’t want to run anything. Qdrant offers excellent filtering, payload indexing, and a strong self-hosted story in Rust. Weaviate bundles modules for built-in embedding and hybrid search. Decide on three axes: scale (millions vs billions of vectors), ops appetite (managed vs self-hosted), and whether you need tight relational joins (favours pgvector). When in doubt, prototype on pgvector, then graduate to a dedicated store only when a real bottleneck appears.

Production retrieval is rarely pure similarity. Metadata filtering scopes a search to vectors matching structured conditions — tenant, language, date, visibility — and is mandatory for multi-tenant safety. Prefer databases that filter during the index traversal rather than fetching neighbours and discarding them afterward, which wrecks recall. Hybrid search fuses dense vectors with sparse keyword scoring so you catch both paraphrases and exact terms like SKUs or surnames; reciprocal rank fusion is a simple, robust way to combine the two rankings. Together, good filtering plus hybrid scoring usually moves retrieval quality more than swapping the embedding model.

Ad placeholder (rectangle)