HNSW vs IVF-PQ

HNSW and IVF-PQ are two approximate nearest neighbor strategies used in vector search. HNSW is a graph-based index. IVF-PQ combines cluster-based search with product quantization, a compression method that stores compact vector codes instead of full vector values for most candidate scoring.

The comparison is not just graph versus clusters. It is graph traversal versus cluster probing plus lossy compression.

Short Answer

HNSW is usually better when low latency and high recall are the priority and enough memory is available.

IVF-PQ is usually worth considering when memory or storage efficiency is more important, the dataset is large, and the application can tolerate additional approximation or use rescoring to recover recall.

What HNSW Is

HNSW stands for Hierarchical Navigable Small World.

It stores vectors in a layered graph. Search starts in sparse upper layers, follows graph edges toward better candidates, and refines in the dense base layer.

HNSW reduces search work by navigating through neighbor links rather than scanning every vector.

What IVF-PQ Is

IVF-PQ combines two ideas:

IVF: partition vectors into centroid-based clusters or posting lists
PQ: compress vector representations using product quantization

At query time, the system chooses relevant clusters, evaluates compressed vector codes inside those clusters, and may rescore the best candidates with full vectors if the implementation supports it.

The Core Difference

The core difference is how each index reduces query work.

HNSW: searches by graph traversal.
IVF-PQ: searches selected clusters using compressed vector codes.

HNSW relies on graph quality. IVF-PQ relies on cluster quality, quantization quality, and probe settings.

How HNSW Searches

HNSW begins from an entry point in the graph.

It follows edges toward closer candidates and keeps a search queue of promising nodes. The search gradually moves from broad upper-layer navigation to detailed local search in the base layer.

The main query-time tuning knob is often a parameter such as ef, which controls how broadly the graph search explores.

How IVF-PQ Searches

IVF-PQ search usually happens in stages.

First, the query is compared with cluster centroids. Second, the system probes the most relevant clusters. Third, vectors inside those clusters are scored using compressed PQ codes. Fourth, some systems over-fetch candidates and rescore them with full vectors.

This design avoids scanning the full dataset and reduces the memory or disk footprint of candidate vectors.

IVF-PQ vs IVFFlat

IVFFlat and IVF-PQ both use cluster probing.

The difference is what happens inside selected clusters. IVFFlat compares against full vector values. IVF-PQ compares against compressed product-quantized codes, often with optional rescoring for the final candidates.

IVFFlat has less quantization error. IVF-PQ has stronger compression.

Product Quantization in Plain Terms

Product quantization compresses vectors by splitting each vector into smaller segments and representing each segment with a learned code.

Instead of storing every original floating-point dimension, the system stores compact identifiers that approximate the original vector. This can save a large amount of memory or storage.

The cost is information loss. Compressed vectors may not preserve all distances perfectly.

Recall Trade-Offs

HNSW recall depends on graph quality and search breadth.

IVF-PQ recall depends on several layers of approximation:

whether the right clusters are probed
whether the PQ codes preserve useful distance information
whether enough candidates are over-fetched
whether final rescoring uses full vectors

This means IVF-PQ can be very efficient, but it needs careful tuning to avoid recall loss.

Latency Trade-Offs

HNSW latency is driven by graph traversal and candidate distance comparisons.

IVF-PQ latency is driven by centroid lookup, number of probed clusters, compressed-code scoring, and any final rescoring. PQ scoring can be efficient because it works with compact representations, but probing too many clusters or rescoring too many candidates can increase latency.

The fastest setting is rarely the best setting. The right setting is the one that meets recall and latency targets together.

Memory Trade-Offs

Memory is one of the biggest reasons to consider IVF-PQ.

HNSW stores a graph and often needs fast access to vector values. The graph edges and vectors can consume substantial RAM at scale.

IVF-PQ reduces the size of vector representations by storing compact codes. This can make very large datasets more practical, especially when full-precision vectors are stored separately or fetched only for rescoring.

Build and Training Cost

HNSW build cost comes from constructing graph connections.

IVF-PQ build cost includes clustering vectors into IVF partitions and training the product quantizer codebooks. The training data should represent the actual vector distribution. If the training set is too small or unrepresentative, PQ quality can suffer.

IVF-PQ can therefore require more planning before indexing begins.

Update Behavior

HNSW can support incremental insertion, but graph maintenance has a cost.

IVF-PQ can assign new vectors to clusters and encode them with an existing quantizer, but distribution drift can become a problem. If new vectors differ from the training distribution, the old centroids or PQ codebooks may become less accurate.

For frequently changing datasets, update and retraining behavior should be part of the evaluation.

Filtering Behavior

Metadata filters can affect both approaches.

In HNSW, filters can make graph traversal harder if many nearby nodes are not eligible for the final result. In IVF-PQ, filters can leave selected clusters with too few eligible candidates, requiring more cluster probing or fallback behavior.

Filtering should be benchmarked with realistic selectivity.

When HNSW Is Usually Better

HNSW is usually a better fit when:

very low query latency is required
high recall matters more than memory savings
the index can fit in RAM
the workload has high query throughput
incremental graph indexing works well for the application
you want to avoid PQ training and compression tuning

When IVF-PQ Is Usually Better

IVF-PQ is usually worth considering when:

memory or storage cost is the main constraint
the dataset is very large
some recall loss is acceptable
rescoring can recover enough final ranking quality
the data distribution is stable enough for trained quantizers
cluster-based probing fits the workload

Parameter Analogy

A rough analogy is:

ef in HNSW controls graph search breadth.
nprobe in IVF-PQ controls how many clusters are searched.
PQ settings control how much vector information is compressed away.

HNSW mainly tunes traversal breadth and graph quality. IVF-PQ tunes cluster coverage, compression quality, and rescoring depth.

Simple Example

Suppose you have hundreds of millions of image embeddings.

HNSW would build a graph of image vectors and navigate through that graph at query time. It may deliver strong recall and latency, but the memory cost can be high.

IVF-PQ would group image vectors into clusters and store compressed codes for candidates. A query would probe relevant clusters and score compact representations, using far less memory but with more approximation.

Common Misunderstandings

Common misunderstandings include:

thinking IVF-PQ is just IVF with a small implementation detail
forgetting that PQ is lossy
assuming compression always improves latency without recall cost
comparing HNSW and IVF-PQ without equal recall targets
ignoring PQ training data quality
forgetting that HNSW can also use compression in some systems

Summary

HNSW and IVF-PQ solve the same broad problem with different trade-offs. HNSW uses layered graph traversal to find approximate nearest neighbors quickly. IVF-PQ uses centroid-based cluster probing plus product quantization to reduce the amount of data searched and stored.

Choose HNSW when latency and recall are the top priorities and memory is available. Consider IVF-PQ when memory or storage efficiency is critical and the application can tolerate compression-aware tuning, over-fetching, or rescoring to preserve result quality.