What Is IVF-PQ?

IVF-PQ is an approximate nearest neighbor index pattern for vector search. It combines inverted file clustering with product quantization, so search can inspect only selected vector clusters and use compressed vector representations inside those clusters.

The goal is to reduce memory and storage while keeping retrieval fast enough for large vector collections.

Short Answer

IVF-PQ is a vector index that first partitions vectors into clusters, then compresses vectors inside those clusters using product quantization.

At query time, it probes the most relevant clusters, scores compressed vector codes, and may rescore a smaller candidate set with full-precision vectors to improve recall.

What IVF Means

IVF stands for inverted file.

In vector search, IVF divides the vector space into clusters. Each cluster has a representative centroid, and vectors assigned to that centroid are stored in a posting list.

Instead of scanning all vectors, a query searches only the posting lists whose centroids are closest to the query.

What PQ Means

PQ stands for product quantization.

Product quantization compresses vectors by splitting each vector into smaller segments or subspaces. Each segment is mapped to a learned centroid from a codebook. The stored vector becomes a sequence of compact codes rather than the original floating-point values.

This can reduce memory dramatically, but it is lossy.

How IVF and PQ Work Together

IVF-PQ uses IVF to reduce the search space and PQ to reduce candidate representation size.

The IVF part answers: which clusters should this query search?

The PQ part answers: how can candidates inside those clusters be stored and scored using less memory?

Together, they make large vector search more resource efficient.

How IVF-PQ Is Built

Building an IVF-PQ index usually involves:

training or choosing cluster centroids for the IVF layer
assigning each vector to a cluster or posting list
training PQ codebooks on vector segments
encoding each vector as compact PQ codes
storing the codes inside the relevant posting lists

Both the cluster training and PQ training should represent the actual data distribution.

What nlist Means

nlist usually refers to the number of IVF clusters or posting lists.

More clusters can reduce the number of vectors scanned per cluster, but may require careful probing to maintain recall. Fewer clusters create larger posting lists, which can increase per-query scan work.

What nprobe Means

nprobe usually refers to the number of clusters searched per query.

Higher nprobe searches more posting lists and usually improves recall. Lower nprobe searches fewer posting lists and usually lowers latency.

This is one of the central tuning knobs in IVF-style search.

What PQ Codes Are

PQ codes are compact identifiers that approximate parts of the original vector.

Instead of storing every vector dimension as a full floating-point number, product quantization stores the closest learned centroid for each vector segment.

The code is much smaller than the original vector, but it cannot represent all original distance information exactly.

Why IVF-PQ Saves Memory

IVF-PQ saves memory because compressed PQ codes are much smaller than full vectors.

For high-dimensional embeddings, this can be a major difference. A full vector may require thousands of bytes, while a compressed code may require only a small fraction of that size.

This is why IVF-PQ is attractive for very large vector collections.

Why IVF-PQ Is Approximate

IVF-PQ has two sources of approximation.

IVF may skip clusters that contain true nearest neighbors.
PQ may distort vector distances because compressed codes are approximate.

That means IVF-PQ can be very efficient, but recall depends on careful tuning.

Over-Fetching and Rescoring

Many IVF-PQ systems improve recall by over-fetching candidates.

The rough search uses compressed codes to find a larger candidate set. Then the system fetches the original vectors for the best candidates and recalculates exact or higher-quality distances.

This rescoring step can recover ranking quality, but it adds extra work and sometimes disk reads.

IVF-PQ vs IVFFlat

IVFFlat and IVF-PQ both use IVF cluster probing.

IVFFlat scans full vectors inside selected clusters. IVF-PQ scans compressed codes inside selected clusters and may rescore a subset with full vectors.

IVFFlat usually has less compression distortion. IVF-PQ usually uses less memory or storage.

IVF-PQ vs HNSW

HNSW uses graph traversal. IVF-PQ uses cluster probing plus compression.

HNSW is often chosen for low-latency, high-recall in-memory search. IVF-PQ is often considered when the dataset is large enough that memory or storage efficiency becomes more important than peak recall or lowest possible latency.

Recall Trade-Offs

IVF-PQ recall depends on:

cluster quality
number of clusters probed
PQ codebook quality
number of PQ segments
candidate over-fetching
whether full-vector rescoring is used

More memory-saving compression can reduce recall if not compensated by probing or rescoring.

Latency Trade-Offs

IVF-PQ can lower latency by scanning less data and using compact codes.

But probing many clusters, over-fetching many candidates, or rescoring many full vectors can increase latency. The practical performance depends on the balance among these settings.

When IVF-PQ Works Well

IVF-PQ works well when:

the dataset is very large
memory or storage is a major constraint
some approximation is acceptable
vectors cluster reasonably well
PQ training data is representative
rescoring can recover enough ranking quality

When IVF-PQ May Be a Poor Fit

IVF-PQ may be a poor fit when:

exact or near-exact recall is required
the dataset is small enough for simpler search
the data distribution changes often
PQ codebooks cannot be trained well
filters make cluster probing unreliable
latency cannot tolerate rescoring or extra probing

Common Misunderstandings

Common misunderstandings include:

thinking IVF-PQ and IVFFlat are the same
forgetting that PQ is lossy
assuming compression only affects memory and not recall
choosing nprobe without measuring recall
training PQ on unrepresentative data
ignoring the cost of rescoring full vectors

Summary

IVF-PQ is a cluster-based, compressed ANN index pattern. IVF narrows search to selected posting lists. PQ compresses vectors into compact codes for memory-efficient candidate scoring.

Its main strength is scaling large vector collections with lower memory or storage requirements. Its main trade-off is additional approximation, which must be managed with cluster probing, PQ configuration, over-fetching, and rescoring.