Question 1

Do I actually need a dedicated vector database?

Accepted Answer

Not always. If you already run PostgreSQL and have under a few million vectors, the pgvector extension is usually enough and saves you a moving part. Reach for a dedicated store like Pinecone, Qdrant, or Weaviate when you need to scale past tens of millions of vectors, want managed sharding, or need very low p99 latency under heavy concurrent load.

Question 2

What is the difference between HNSW and a flat index?

Accepted Answer

A flat (brute-force) index compares the query against every stored vector — perfectly accurate but slow once you have many vectors. HNSW builds a navigable graph that finds approximate nearest neighbours in roughly logarithmic time, trading a small amount of recall for a large speed gain. Use flat for small datasets or exact-recall needs, HNSW for everything at scale.

Question 3

What is metadata filtering and why does it matter?

Accepted Answer

Metadata filtering restricts a similarity search to vectors matching structured conditions — for example only documents where tenant_id equals a value or status equals published. It is essential for multi-tenant apps and security, because without it a search could surface another customer's data. Check that your chosen database filters efficiently rather than scanning everything post-search.

Question 4

What is hybrid search?

Accepted Answer

Hybrid search combines dense vector similarity (semantic meaning) with sparse keyword scoring (BM25 or full-text). Vectors catch paraphrases and concepts; keywords catch exact terms, product codes, and rare names that embeddings blur. Fusing both with a method like reciprocal rank fusion usually beats either alone on real retrieval tasks.

Question 5

Managed cloud or self-hosted?

Accepted Answer

Managed services (Pinecone, Weaviate Cloud, Qdrant Cloud) remove ops burden, handle scaling and backups, and are fastest to ship — at a recurring cost and some data-residency tradeoffs. Self-hosting pgvector, Qdrant, or Weaviate gives full control and lower long-run cost but means you own upgrades, replication, and tuning. Start managed to validate, move self-hosted if cost or compliance demands it.

How to Use Vector Databases: A Developer's Guide

What a vector database does

Picking an index: HNSW vs flat (and IVF)

Choosing a database

Metadata filtering and hybrid search