How is the raw vector size calculated?

Raw size is vector count multiplied by dimensions multiplied by the bytes per element — 4 for float32, 2 for float16, 1 for int8. For example one million 1536-dimensional float32 vectors is 1,000,000 × 1536 × 4 ≈ 5.7 GiB before any index overhead.

What overhead does HNSW add?

HNSW stores a navigable graph on top of the vectors. Each node keeps roughly M connections on the base layer plus a fraction on upper layers, and each link is a 4-byte node ID. The common estimate is about M × 2 links per node times 4 bytes, which this tool adds on top of the raw vectors. Higher M means better recall but more memory.

Why is IVF smaller than HNSW?

IVF (inverted file) clusters vectors into a number of partitions and stores only small per-cluster centroids plus the vectors, so its index overhead is far smaller than HNSW's graph. The trade-off is that IVF typically needs a probe parameter at query time and can have lower recall unless you scan more partitions.

Does quantization reduce the size?

Yes, dramatically. Switching from float32 to int8 cuts the raw vector storage to a quarter, and float16 halves it, with some loss of precision. Many production systems quantize to int8 or use product quantization to fit large indexes in memory; this calculator lets you compare the precision options directly.

Embedding Index Size Calculator

Know how much RAM your vectors need before you provision it

Vector indexes are memory-hungry, and the bill scales with three numbers: how many vectors you store, how many dimensions each has, and how you store them. On top of that, the index structure itself adds overhead — a lot for HNSW, very little for flat or IVF. This calculator combines all of that into a single size estimate so you can pick an instance class or storage tier with confidence instead of finding out after an out-of-memory crash.

How it works

The raw vector storage is vector count times dimensions times bytes per element, where float32 is 4 bytes, float16 is 2, and int8 is 1. On top of that the tool adds index-specific overhead. A flat index adds essentially nothing — it just scans the raw vectors. HNSW adds a graph: roughly M × 2 links per node, each a 4-byte node ID, which is why HNSW indexes are markedly larger and why the M parameter matters. IVF adds only small centroid storage for its clusters. The totals are summed and shown in GiB or MiB along with the breakdown.

Tips and notes

These are estimates of the dominant costs; real databases add metadata, payloads, deletion tombstones, and allocator overhead, so add headroom of 20–50% on top. The biggest lever is precision: moving from float32 to int8 quarters the raw storage, and product quantization can compress further at some recall cost. HNSW gives the best recall and speed but the highest memory — lower M to save RAM, raise it for recall. IVF and flat trade query speed for a smaller footprint. Use the calculator to compare configurations side by side before committing to a deployment size.