How Does the HNSW Algorithm Work?

The HNSW algorithm works by building a multi-layer graph of vectors, then searching that graph from coarse upper layers down to a dense bottom layer. This lets the search make large jumps first and then refine locally near the query vector.

HNSW is popular because it can find close neighbors much faster than comparing a query with every vector in a collection.

Short Answer

HNSW works in two main phases:

  1. During indexing, it inserts each vector into a layered graph and connects it to nearby vectors.
  2. During search, it starts near the top of the graph, moves toward closer vectors, descends layer by layer, and returns the closest candidates found at the bottom.

The result is fast approximate nearest neighbor search.

The Core Idea

HNSW avoids brute-force search.

Instead of checking every vector, it keeps a graph where nearby vectors are connected. Search becomes a navigation problem: start somewhere in the graph, follow connections toward vectors closer to the query, and stop when no nearby connection improves the result enough.

The hierarchy makes this navigation faster.

The Graph Structure

In HNSW, vectors are nodes in a graph. Edges connect nodes that are close to each other under the chosen distance metric.

The bottom layer contains all vectors. Higher layers contain fewer vectors. A vector may appear in multiple layers, but every vector appears in the bottom layer.

Upper layers act like express lanes. Lower layers act like local roads.

Why Layers Help

If the graph had only one dense layer, the search would need to move through many local connections.

With layers, HNSW can first move quickly across a sparse graph with long-range links. Once it reaches a promising region, it descends into denser layers to search more precisely.

This is why the algorithm is called hierarchical.

Search Step by Step

A simplified HNSW search looks like this:

  1. Start at an entry point in the highest available layer.
  2. Compare the query vector with the current node and its neighbors.
  3. Move to a neighbor if that neighbor is closer to the query.
  4. Repeat until no neighbor in that layer improves the position.
  5. Move down one layer and repeat the search.
  6. At the bottom layer, keep a candidate list of close vectors.
  7. Return the closest candidates from that list.

The upper layers guide the search. The bottom layer produces the final result candidates.

What the Entry Point Does

The entry point is where the search begins.

HNSW does not start from every vector. It starts from a known node in the upper graph and navigates from there. Because the upper graph is sparse and contains long-range connections, the search can move toward a good region quickly.

Greedy Search in Upper Layers

At upper layers, HNSW often behaves like a greedy search.

It asks: “Among the neighbors I can reach from here, is any one closer to the query?” If yes, it moves there. If not, it descends to the next layer.

This coarse search is fast because upper layers contain fewer nodes.

Candidate Search in the Bottom Layer

The bottom layer contains all vectors, so this is where final candidate quality matters most.

Instead of keeping only one best node, the algorithm keeps a candidate list. The size of this list controls how broadly the search explores nearby candidates.

A larger candidate list can improve recall. A smaller list can reduce latency.

What ef Controls

In many HNSW implementations, ef controls the size of the dynamic candidate list at query time.

Higher ef means the search explores more candidate nodes. This can improve recall but usually makes queries slower.

Lower ef means the search explores fewer candidates. This can be faster but may miss some true nearest neighbors.

How Insertion Works

When a new vector is inserted, HNSW searches the existing graph to find where the new vector belongs.

A simplified insertion process is:

  1. Choose the highest layer where the new vector will appear.
  2. Start from the current entry point.
  3. Search through upper layers to find a good region.
  4. At each relevant layer, find nearby existing nodes.
  5. Connect the new vector to selected neighbors.
  6. Update graph links so future searches can use the new node.

This makes HNSW suitable for systems where vectors are added over time, although inserts still have indexing cost.

How Layer Assignment Works

Not every vector appears in every layer.

Most vectors appear only in the bottom layer. Fewer vectors appear in each higher layer. This creates the hierarchy: dense at the bottom, sparse at the top.

The sparse upper layers are what let HNSW skip large regions of the dataset.

What maxConnections Controls

maxConnections controls how many graph links each node can keep.

More connections can make the graph easier to navigate and improve recall. But more connections also use more memory and can increase indexing or search work.

Too few connections can make the graph less navigable.

What efConstruction Controls

efConstruction affects how thoroughly HNSW searches for good neighbors while building the graph.

A higher value can build a better-connected graph, which can improve search quality later. The cost is slower indexing and sometimes more resource use.

This is a build-time quality setting, not a query-time setting.

Why HNSW Is Approximate

HNSW does not compare the query to every vector. It follows graph paths and explores a limited candidate set.

That is why it is approximate. It usually finds very close results quickly, but it may not always return the exact same top-k list as a brute-force scan.

The more thoroughly it searches, the closer it gets to exact behavior.

Recall and Latency Trade-Off

The main operational trade-off is recall versus latency.

  • Higher recall: explore more candidates, use stronger graph settings, accept more latency or memory use.
  • Lower latency: explore fewer candidates, use lighter settings, accept some recall loss.

Production systems need to choose settings based on user experience and retrieval quality requirements.

Why HNSW Works Well for Embeddings

Embeddings often live in high-dimensional spaces. A brute-force search over many high-dimensional vectors can be expensive.

HNSW works well because it turns the search into graph navigation. It uses approximate structure to avoid most unnecessary comparisons while still finding strong candidate neighbors.

Simple Analogy

Think of finding a house in a large city.

You do not inspect every house. You first travel to the right region, then the right neighborhood, then the right street, then the exact house.

HNSW does something similar: upper layers move toward the right region, lower layers refine the search locally.

Common Mistakes

Common mistakes include:

  • thinking HNSW checks every vector
  • expecting approximate results to always match brute force
  • tuning ef without measuring recall
  • using very low graph connectivity and expecting high recall
  • ignoring build-time settings such as efConstruction
  • forgetting that higher recall usually costs latency or memory

Summary

HNSW works by building a layered graph of vectors. Search starts in sparse upper layers, makes fast long-range moves, descends into denser layers, and returns close candidates from the bottom layer.

The algorithm is fast because it avoids comparing against every vector. It is approximate because it explores only part of the graph. Its practical value comes from tuning the balance between recall, latency, memory, and indexing cost.