IVF-PQ is an approximate nearest neighbor index pattern for vector search. It combines inverted file clustering with product quantization, so search can inspect only selected vector clusters and use compressed vector representations inside those clusters.
The goal is to reduce memory and storage while keeping retrieval fast enough for large vector collections.
Short Answer
IVF-PQ is a vector index that first partitions vectors into clusters, then compresses vectors inside those clusters using product quantization.
At query time, it probes the most relevant clusters, scores compressed vector codes, and may rescore a smaller candidate set with full-precision vectors to improve recall.
What IVF Means
IVF stands for inverted file.
In vector search, IVF divides the vector space into clusters. Each cluster has a representative centroid, and vectors assigned to that centroid are stored in a posting list.
Instead of scanning all vectors, a query searches only the posting lists whose centroids are closest to the query.
What PQ Means
PQ stands for product quantization.
Product quantization compresses vectors by splitting each vector into smaller segments or subspaces. Each segment is mapped to a learned centroid from a codebook. The stored vector becomes a sequence of compact codes rather than the original floating-point values.
This can reduce memory dramatically, but it is lossy.
How IVF and PQ Work Together
IVF-PQ uses IVF to reduce the search space and PQ to reduce candidate representation size.
The IVF part answers: which clusters should this query search?
The PQ part answers: how can candidates inside those clusters be stored and scored using less memory?
Together, they make large vector search more resource efficient.
How IVF-PQ Is Built
Building an IVF-PQ index usually involves:
- training or choosing cluster centroids for the IVF layer
- assigning each vector to a cluster or posting list
- training PQ codebooks on vector segments
- encoding each vector as compact PQ codes
- storing the codes inside the relevant posting lists
Both the cluster training and PQ training should represent the actual data distribution.
What nlist Means
nlist usually refers to the number of IVF clusters or posting lists.
More clusters can reduce the number of vectors scanned per cluster, but may require careful probing to maintain recall. Fewer clusters create larger posting lists, which can increase per-query scan work.
What nprobe Means
nprobe usually refers to the number of clusters searched per query.
Higher nprobe searches more posting lists and usually improves recall. Lower nprobe searches fewer posting lists and usually lowers latency.
This is one of the central tuning knobs in IVF-style search.
What PQ Codes Are
PQ codes are compact identifiers that approximate parts of the original vector.
Instead of storing every vector dimension as a full floating-point number, product quantization stores the closest learned centroid for each vector segment.
The code is much smaller than the original vector, but it cannot represent all original distance information exactly.
Why IVF-PQ Saves Memory
IVF-PQ saves memory because compressed PQ codes are much smaller than full vectors.
For high-dimensional embeddings, this can be a major difference. A full vector may require thousands of bytes, while a compressed code may require only a small fraction of that size.
This is why IVF-PQ is attractive for very large vector collections.
Why IVF-PQ Is Approximate
IVF-PQ has two sources of approximation.
- IVF may skip clusters that contain true nearest neighbors.
- PQ may distort vector distances because compressed codes are approximate.
That means IVF-PQ can be very efficient, but recall depends on careful tuning.
Over-Fetching and Rescoring
Many IVF-PQ systems improve recall by over-fetching candidates.
The rough search uses compressed codes to find a larger candidate set. Then the system fetches the original vectors for the best candidates and recalculates exact or higher-quality distances.
This rescoring step can recover ranking quality, but it adds extra work and sometimes disk reads.
IVF-PQ vs IVFFlat
IVFFlat and IVF-PQ both use IVF cluster probing.
IVFFlat scans full vectors inside selected clusters. IVF-PQ scans compressed codes inside selected clusters and may rescore a subset with full vectors.
IVFFlat usually has less compression distortion. IVF-PQ usually uses less memory or storage.
IVF-PQ vs HNSW
HNSW uses graph traversal. IVF-PQ uses cluster probing plus compression.
HNSW is often chosen for low-latency, high-recall in-memory search. IVF-PQ is often considered when the dataset is large enough that memory or storage efficiency becomes more important than peak recall or lowest possible latency.
Recall Trade-Offs
IVF-PQ recall depends on:
- cluster quality
- number of clusters probed
- PQ codebook quality
- number of PQ segments
- candidate over-fetching
- whether full-vector rescoring is used
More memory-saving compression can reduce recall if not compensated by probing or rescoring.
Latency Trade-Offs
IVF-PQ can lower latency by scanning less data and using compact codes.
But probing many clusters, over-fetching many candidates, or rescoring many full vectors can increase latency. The practical performance depends on the balance among these settings.
When IVF-PQ Works Well
IVF-PQ works well when:
- the dataset is very large
- memory or storage is a major constraint
- some approximation is acceptable
- vectors cluster reasonably well
- PQ training data is representative
- rescoring can recover enough ranking quality
When IVF-PQ May Be a Poor Fit
IVF-PQ may be a poor fit when:
- exact or near-exact recall is required
- the dataset is small enough for simpler search
- the data distribution changes often
- PQ codebooks cannot be trained well
- filters make cluster probing unreliable
- latency cannot tolerate rescoring or extra probing
Common Misunderstandings
Common misunderstandings include:
- thinking IVF-PQ and IVFFlat are the same
- forgetting that PQ is lossy
- assuming compression only affects memory and not recall
- choosing
nprobewithout measuring recall - training PQ on unrepresentative data
- ignoring the cost of rescoring full vectors
Summary
IVF-PQ is a cluster-based, compressed ANN index pattern. IVF narrows search to selected posting lists. PQ compresses vectors into compact codes for memory-efficient candidate scoring.
Its main strength is scaling large vector collections with lower memory or storage requirements. Its main trade-off is additional approximation, which must be managed with cluster probing, PQ configuration, over-fetching, and rescoring.