What Is a PQ Index?

A PQ index is a vector search index that uses product quantization to store, score, or narrow candidate vectors with compressed representations.

The term can mean different things depending on the system. In practice, it usually refers to an approximate nearest neighbor index that combines a routing structure with PQ-compressed vector codes.

Short Definition

A PQ index is an ANN index that uses product quantization codes as part of vector retrieval.

Instead of relying only on full-precision vectors during search, the index can use compact centroid-ID codes to estimate distances, reduce memory, and scan more candidates efficiently.

Why PQ Indexes Exist

Large vector collections are expensive to keep in memory.

Each embedding may contain hundreds or thousands of floating-point dimensions. A PQ index reduces the search-time memory footprint by storing compressed codes for those vectors.

This makes large-scale nearest neighbor search more practical when memory cost is a limiting factor.

What a PQ Index Contains

A PQ index usually contains several parts:

  • compressed PQ codes for stored vectors
  • codebooks that map codes to learned centroids
  • a routing structure such as a graph, cluster index, or candidate list
  • metadata needed to map candidates back to stored objects
  • optionally, original vectors for final rescoring

The exact layout depends on whether PQ is combined with HNSW, IVF, flat search, or another ANN strategy.

PQ Index vs PQ Code

A PQ code is the compressed representation of one vector.

A PQ index is the search structure that uses many PQ codes to retrieve nearest neighbors.

In other words, PQ codes are stored data. The PQ index is the system that searches through them.

PQ Index vs Product Quantizer

A product quantizer learns codebooks and encodes vectors.

A PQ index uses the quantizer’s output during search. It may also store the codebooks needed to interpret those codes.

The quantizer creates the compressed representation. The index uses that representation for retrieval.

How a PQ Index Is Built

A typical build process looks like this:

  • collect representative training vectors
  • split vectors into segments
  • train codebooks for each segment position
  • encode stored vectors into PQ codes
  • build or update the ANN routing structure
  • store codebooks, codes, object IDs, and optional full vectors

Some systems train PQ before building the main index. Others build an index first and then compress vector representations after enough training data is available.

PQ With Graph Indexes

In a graph-based PQ index, the graph controls which vectors are explored.

The vector values used during traversal may be compressed with PQ. This reduces memory, but approximate distances can affect which graph nodes are visited.

Graph-plus-PQ designs are often useful when low latency is important but full-vector memory cost is too high.

PQ With IVF Indexes

In an IVF-PQ-style index, IVF provides cluster routing and PQ compresses vectors inside the selected clusters.

The query first chooses likely clusters or posting lists. Then the index compares the query against PQ codes within those candidate regions.

This design is common when memory efficiency and large-scale candidate scanning are priorities.

PQ With Flat Search

A PQ index can also be used with a flat candidate scan.

Instead of scanning full vectors, the system scans compressed PQ codes. This can be useful when the candidate set is bounded by a tenant, filter, shard, or previous retrieval stage.

Flat PQ search is still approximate unless full vectors are used for final scoring.

Query Flow

A PQ index usually follows this query flow:

  • receive or generate a query vector
  • use the routing structure to identify candidate regions or nodes
  • estimate distances using PQ codes and codebooks
  • keep a candidate bucket larger than the final result count
  • optionally fetch full vectors for rescoring
  • return the final nearest neighbors

The first pass is approximate. Rescoring can improve final ranking if original vectors are available.

Why Rescoring Matters

PQ codes approximate the original vector space.

When the index scores candidates with compressed codes, distances may be distorted. Rescoring recomputes distances for a smaller candidate set using full vectors or a more accurate representation.

This can recover recall and improve ranking quality, but it adds read and compute cost.

Memory Benefits

The main benefit of a PQ index is memory reduction.

For example, a vector stored as hundreds of 32-bit floats may become a much smaller sequence of one-byte segment codes.

The index still needs codebooks, routing data, metadata, and sometimes original vectors, but the searchable representation can shrink dramatically.

Recall Trade-Off

A PQ index can lose recall because compressed distances are approximate.

Relevant vectors can be missed if they appear too far away in compressed space or never enter the candidate bucket.

Recall can often be improved by using better training data, more segments, larger candidate pools, more probes, higher graph search breadth, or final rescoring.

Latency Trade-Off

A PQ index can reduce latency by comparing smaller codes and improving cache behavior.

It can also increase latency if the system expands too many candidates or performs expensive rescoring from disk.

The practical latency depends on the routing index, compression settings, candidate size, and storage layout.

Build-Time Trade-Off

PQ indexes have build-time costs that simpler indexes do not.

The system must train codebooks, encode vectors, and keep compression settings compatible with the index. If the vector distribution changes, retraining may be needed.

Build time and conversion time should be part of the evaluation, especially for large or frequently updated collections.

When a PQ Index Helps

A PQ index helps when:

  • the collection is large
  • vectors are high dimensional
  • memory is the main bottleneck
  • approximate candidate search is acceptable
  • full-vector rescoring is available when quality matters
  • the system has enough representative data to train codebooks

When a PQ Index May Be Risky

A PQ index may be risky when:

  • the collection is small enough for exact search
  • recall must be extremely high
  • the vector distribution changes frequently
  • filters leave very small candidate sets
  • full vectors are not available for rescoring
  • compression settings are chosen without benchmarking

What to Benchmark

Benchmark a PQ index with:

  • memory before and after compression
  • recall at the target k
  • p50, p95, and p99 latency
  • queries per second under concurrency
  • candidate bucket size
  • rescoring cost
  • index build and conversion time
  • filtered-query performance
  • quality after new data is added

Summary

A PQ index is a vector search index that uses product-quantized representations during retrieval. It may combine PQ codes with graph traversal, IVF cluster routing, or flat scans.

The benefit is lower memory usage and cheaper candidate scoring. The trade-off is approximate distance estimation, so recall, latency, build time, and rescoring behavior must be tested on real queries.