Distance metrics affect vector search results because they define what “close” means. If the metric changes, the ranking can change even when the query, vectors, and database stay the same.
This matters for semantic search, RAG, recommendations, clustering, duplicate detection, and any system that retrieves nearest vectors.
The Short Version
A vector database compares a query vector with stored vectors. The distance metric decides how that comparison is calculated.
Different metrics can prefer different results:
- cosine focuses on direction
- L2 focuses on straight-line coordinate distance
- dot product considers alignment and magnitude
- Manhattan distance adds absolute coordinate differences
- Hamming distance counts differing positions
Because each metric measures a different kind of closeness, changing the metric can change the top results.
Why Ranking Changes
Vector search is a ranking problem.
The database usually asks: which stored vectors are nearest to this query vector?
If “nearest” means smallest cosine distance, one set of results may win. If “nearest” means smallest L2 distance, another set may win. If “nearest” means strongest dot product, the ranking may change again.
This is not a bug. It is the expected result of using different mathematical definitions of similarity.
Example: Direction vs Distance
Imagine two stored vectors:
- Vector A points in almost the same direction as the query but has a different length.
- Vector B is closer in raw coordinate distance but points in a slightly different direction.
Cosine distance may prefer Vector A because it focuses on direction.
L2 distance may prefer Vector B because it focuses on coordinate distance.
Neither answer is automatically wrong. The right answer depends on what the embedding model expects and what the application needs.
Distance vs Similarity Interpretation
One of the easiest mistakes is sorting scores in the wrong direction.
With distance values:
- lower usually means closer
- higher usually means farther
With similarity scores:
- higher usually means more similar
- lower usually means less similar
If the database returns a distance and the application treats it like a similarity score, the ranking can be reversed.
Cosine Distance Effects
Cosine distance focuses on vector direction.
This is often useful for text embeddings because two pieces of text can be about the same topic even if their vector magnitudes differ. Cosine-style comparison can reduce the importance of length and focus more on angular similarity.
Cosine distance can change results when magnitude carries information. If vector length matters in the model, cosine may ignore part of the signal.
L2 Distance Effects
L2 distance measures straight-line coordinate distance.
It can prefer vectors that are close in actual coordinate space. This can be useful when the embedding model was trained or evaluated with Euclidean-style distance.
L2 can be affected by vector magnitude. If two vectors point in similar directions but have very different lengths, L2 may treat them as farther apart than cosine would.
Squared L2 Effects
Squared L2 distance uses the same coordinate differences as L2 but skips the square root.
It usually preserves nearest-neighbor order compared with ordinary L2, but the returned values are on a different scale. This matters for thresholds, score displays, and custom ranking rules.
A threshold for ordinary L2 should not be reused as a threshold for squared L2.
Dot Product Effects
Dot product considers alignment and magnitude.
It can reward vectors that point in a similar direction and have larger magnitude. For some embedding models, this is exactly what is intended. For others, it may overemphasize magnitude.
If vectors are normalized to the same length, dot product and cosine-style comparison can become closely related. If they are not normalized, they can behave differently.
Manhattan or L1 Effects
Manhattan distance adds absolute coordinate differences.
It can behave differently from L2 because it does not square differences. L2 gives larger coordinate gaps more weight, while L1 adds each gap directly.
This can affect rankings when one candidate has one large difference and another candidate has many small differences.
Hamming Distance Effects
Hamming distance counts differing positions.
It is useful for binary, categorical, or discrete vector representations. It is usually not the default choice for ordinary dense text embeddings.
If used with the wrong type of vector, it may ignore useful magnitude or coordinate information.
Thresholds Change With the Metric
Distance thresholds are metric-specific.
A threshold such as “return only results with distance below 0.5” means different things depending on the metric.
It also depends on:
- the embedding model
- vector dimensions
- whether vectors are normalized
- whether the score is raw or transformed
- whether the metric is ordinary L2 or squared L2
Thresholds should be learned from real examples, not guessed.
Metric Choice Affects RAG
In RAG, vector search decides which chunks are sent to the language model.
If the distance metric ranks the wrong chunks highly, the answer may be incomplete or inaccurate. The language model may be strong, but it cannot use context that was not retrieved.
For RAG systems, test metric choice with answer quality, not only with nearest-neighbor speed.
Metric Choice Affects Evaluation
Changing the distance metric can change evaluation results.
Metrics such as recall, precision, MRR, and nDCG may improve or decline when the distance metric changes. A system can look good under one metric and weaker under another.
When comparing metrics, use the same dataset, same queries, same filters, and same evaluation method.
Metric Choice Affects Filtering
Filters and distance metrics interact.
If metadata filters reduce the candidate set, the distance metric decides which remaining candidates are ranked highest. Highly selective filters can make score interpretation harder because the best available result may still be weak.
Evaluate filtered search separately from unfiltered search.
Metric Choice Affects Hybrid Search
Hybrid search combines vector search with keyword search.
If vector scores are distance values and keyword scores are similarity-style values, the system must combine them carefully. Raw scores from different retrieval methods are usually not directly comparable.
Score normalization, weighting, and fusion rules should be tested with real queries.
Use the Metric Expected by the Model
The best starting point is the embedding model’s recommendation.
Some models are trained or documented for cosine similarity. Others may support dot product or Euclidean distance. If the model expects one metric and the database uses another, retrieval quality can drop.
The metric is part of the embedding system, not just a database setting.
Common Mistakes
Common mistakes include:
- using cosine because it is common without checking the model
- sorting distance values as if higher is better
- copying thresholds from one metric to another
- comparing raw cosine, L2, and dot product scores directly
- switching metrics without rebuilding evaluation baselines
- testing unfiltered search but deploying filtered search
- assuming speed and relevance move together
Practical Testing Approach
To test metric impact, use a small but realistic benchmark.
Include:
- real user queries
- known good results
- hard negatives
- short and long queries
- domain-specific vocabulary
- filtered and unfiltered cases
Compare result order under each candidate metric. Look at both numeric evaluation and manual result quality.
Summary
Distance metrics affect vector search results by changing how the system defines closeness. Cosine, L2, dot product, Manhattan, and Hamming can rank the same vectors differently.
Metric choice affects ranking, thresholds, RAG quality, hybrid search, filtering, and evaluation.
The safest rule is to use the metric expected by the embedding model, then validate with real queries and relevance tests.