Product quantization compresses vectors by turning long floating-point embeddings into short sequences of learned codes. It does this without treating the whole vector as one unit. Instead, it divides the vector into smaller pieces and compresses each piece separately.
This article walks through the compression pipeline step by step.
The Basic Idea
A vector embedding is usually a list of numeric dimensions. Product quantization replaces groups of those numbers with compact identifiers.
The result is a compressed vector that is much smaller than the original. The trade-off is that the compressed vector is approximate, not exact.
Step 1: Start With Full-Precision Vectors
PQ begins with normal embeddings.
For example, a vector might have 128, 384, 768, or 1536 dimensions. Each dimension is commonly stored as a 32-bit float.
A 128-dimensional vector can be imagined like this:
[d1, d2, d3, d4, ..., d128]
Product quantization compresses this list into a shorter representation.
Step 2: Split the Vector Into Segments
The vector is divided into equal-sized segments, also called sub-vectors or subspaces.
For example:
128 dimensions
32 segments
4 dimensions per segment
The first segment contains dimensions 1 through 4. The second segment contains dimensions 5 through 8, and so on.
Each segment will be compressed independently.
Step 3: Gather Training Examples
PQ needs examples before it can compress well.
The system samples vectors from the collection and applies the same segmentation process to them. These training segments show the common shapes and value patterns in the data.
The training set should represent the vectors that will later be indexed and searched.
Step 4: Learn Centroids for Each Segment Position
For each segment position, the system learns a set of centroids.
A centroid is a representative segment value. It stands in for many similar segment values found in the training data.
If the system uses 256 centroids for a segment position, then every possible segment at that position will be approximated by one of those 256 representatives.
Step 5: Build Codebooks
The centroids for a segment position form a codebook.
Each segment position has its own codebook because different parts of a vector may have different value distributions.
The codebook maps compact IDs to learned centroids:
code 0 -> centroid A
code 1 -> centroid B
...
code 255 -> centroid Z
This codebook is what makes compression and later approximation possible.
Step 6: Encode Each Vector Segment
After training, PQ compresses each stored vector.
For every segment in a vector, the system finds the closest centroid in that segment position’s codebook. It then stores the centroid ID instead of the original segment values.
A four-dimensional segment that originally required several floating-point numbers may become a single one-byte code.
Step 7: Store the Sequence of Codes
After all segments are encoded, the vector becomes a sequence of codes.
For example, 32 segments may become 32 one-byte values:
[17, 203, 88, 4, ..., 112]
Those codes are the compressed vector representation.
Step 8: Keep the Codebooks
The compressed codes are only meaningful together with the codebooks.
The codebooks tell the system what each code means for each segment position. Without the codebooks, the stored codes cannot be interpreted.
This is why PQ memory usage includes both compressed codes and codebook overhead.
Step 9: Approximate Distances
During search, the system can estimate distances using the compressed codes and codebooks.
One approach reconstructs approximate vector segments from the stored codes and compares those approximations. Another approach uses lookup tables to estimate distances directly from compressed representations.
Either way, the distance is approximate because the original vector values were replaced by centroids.
Step 10: Optionally Rescore Candidates
Some systems store or fetch the original full-precision vectors for final ranking.
In that design, PQ is used to find likely candidates cheaply. Then the best candidates are rescored using original vectors.
This helps recover ranking quality while still reducing the memory needed for broad candidate search.
A Compression Example
Suppose a vector has 128 dimensions and each dimension uses 4 bytes.
128 x 4 bytes = 512 bytes
If PQ splits the vector into 32 segments and stores one byte per segment, the compressed code is:
32 x 1 byte = 32 bytes
The codebooks add overhead, and original vectors may still be stored elsewhere for rescoring, but the searchable compressed representation is much smaller.
Why Segmentation Matters
Segmentation is the key design choice in PQ.
If segments are too large, each centroid has to represent too much variation. That can increase distortion.
If segments are too small or too numerous, recall may improve, but memory savings shrink because more codes are stored per vector.
Why Training Quality Matters
PQ works best when the codebooks match the vector distribution.
If the training vectors are representative, the centroids can approximate future vectors well. If the distribution shifts, old centroids may become less accurate.
This is why product quantization should be tested again after major embedding model changes or large changes in data composition.
Where Information Is Lost
Information is lost when each original segment is replaced by its nearest centroid.
The compressed code keeps the general region of the segment, but not the exact original coordinates.
Many different segments can share the same code. That makes PQ compact, but it also creates distance distortion.
How Compression Affects Search
Compressed vectors can make search cheaper because the system compares smaller representations.
However, approximate distances can change candidate order. A relevant vector might look less relevant after compression, while a less relevant vector might look closer.
Good PQ configurations balance memory savings with enough detail to preserve useful nearest-neighbor behavior.
What Engineers Tune
Common tuning choices include:
- number of segments
- dimensions per segment
- number of centroids per segment
- training sample size
- candidate pool size
- whether to rescore with original vectors
- how often to retrain after data distribution changes
Common Mistakes
Common mistakes include:
- using too little training data
- training on unrepresentative vectors
- assuming higher compression has no recall cost
- forgetting codebook overhead
- not testing filtered queries
- not measuring ranking quality after compression
Summary
Product quantization compresses vectors by splitting each embedding into segments, learning centroids for each segment position, and storing compact centroid IDs instead of full floating-point values.
The compressed representation is much smaller, but it is approximate. The quality of PQ compression depends on segment choices, codebook training, candidate selection, rescoring strategy, and benchmark results on real search workloads.