How to Choose a Distance Metric for Embeddings

Choose a distance metric for embeddings by starting with the metric the embedding model was trained or evaluated to use. Then verify that choice with real queries, expected results, filters, and the retrieval quality your application needs.

There is no single best distance metric for every embedding system. Cosine, dot product, L2, squared L2, Manhattan, and Hamming measure different kinds of closeness. The best choice depends on the model, data, and use case.

Start With the Embedding Model

The safest first rule is simple: use the distance metric recommended by the embedding model.

Embedding models are trained with assumptions about how vectors will be compared. If a model was trained with cosine similarity, cosine-style search is usually the right starting point. If a model was trained with Euclidean distance, L2 or squared L2 may be appropriate. If a model expects dot product, use dot-product-style scoring.

The distance metric is not only a database option. It is part of the embedding system.

Check the Model Documentation

Before choosing a metric, check the model documentation for:

recommended similarity or distance metric
whether vectors should be normalized
whether dot product, cosine, or L2 was used in training
whether the model supports multiple metrics
benchmark examples or retrieval guidance

If the provider gives a clear recommendation, use that as the default unless your own evaluation proves another metric works better.

Understand the Main Options

Different metrics answer different questions.

Cosine asks: do the vectors point in a similar direction?

Dot product asks: how strongly do the vectors align, including magnitude?

L2 asks: how far apart are the vectors in coordinate space?

Manhattan asks: what is the sum of absolute coordinate differences?

Hamming asks: how many positions differ?

These definitions produce different rankings, so metric choice directly affects search results.

When Cosine Is a Good Starting Point

Cosine similarity or cosine distance is often a good starting point for text embeddings when the model documentation recommends it.

Cosine focuses on direction rather than raw vector length. This can work well when documents should match by meaning even if their vector magnitudes differ.

Use cosine when:

the model recommends cosine similarity
direction matters more than magnitude
vectors are normalized or intended for angular comparison
you are building semantic text search with a model designed for cosine

When Dot Product May Fit

Dot product can be useful when both direction and magnitude matter.

Some embedding models are trained for dot-product retrieval. In that case, dot product may be better than cosine or L2.

Use dot product when:

the model recommends dot product
vector magnitude contains useful signal
the retrieval benchmark for the model uses dot product
your evaluation shows better ranking than cosine or L2

Be careful with score interpretation. Some systems expose dot product as a similarity-style score, while others transform it into a distance-style value.

When L2 or Squared L2 May Fit

L2 distance, also called Euclidean distance, measures coordinate distance between vectors. Squared L2 uses the same differences but skips the final square root.

Use L2 or squared L2 when:

the model was trained or evaluated with Euclidean distance
coordinate distance matters for the representation
the index is optimized for L2-style retrieval
your evaluation shows better results than cosine or dot product

Remember that L2 is a distance: lower values mean closer vectors.

When Manhattan Distance May Fit

Manhattan distance, or L1 distance, adds absolute coordinate differences.

It may be useful for some feature-engineered, sparse, or non-standard vector representations. It is less commonly the default for modern dense text embeddings unless the model or task supports it.

Use Manhattan only when the data and evaluation justify it.

When Hamming Distance May Fit

Hamming distance counts positions that differ.

It is useful for binary, categorical, or discrete vector representations. It is usually not the right choice for ordinary dense floating-point embeddings.

Use Hamming when the vector representation is designed for position-wise difference counting.

Do Not Choose by Popularity Alone

Cosine is common, but common does not mean correct for every model. L2 is familiar, but familiar does not mean correct for every embedding space. Dot product can be powerful, but only when the model and score interpretation support it.

The metric should be chosen because it matches the model and improves retrieval quality, not because it appears often in examples.

Evaluate With Real Queries

After choosing a starting metric, evaluate it.

Create a small benchmark with:

real user queries
known relevant documents
hard negative examples
short and long queries
domain-specific terms
filtered and unfiltered search cases
queries that should return no strong match

Then compare metrics using the same data and same retrieval pipeline.

Measure More Than Top-1

Do not judge only the first result.

Useful evaluation measures include:

recall at k
precision at k
MRR
nDCG
RAG answer quality
manual relevance review

For RAG, the best metric is the one that retrieves the context needed to answer correctly, not only the one that produces a nice-looking nearest-neighbor score.

Test With Filters

Many production searches are filtered.

For example:

only this tenant’s data
only documents the user can access
only one language
only recent content
only a product category

A metric may perform well on open search but worse after filters reduce the candidate set. Test the metric under realistic filters.

Set Thresholds After Choosing the Metric

Distance and similarity thresholds are metric-specific.

A cosine threshold cannot be reused as an L2 threshold. An ordinary L2 threshold cannot be reused as a squared L2 threshold. A threshold from one embedding model may not work for another model.

Set thresholds after you choose the metric and after you inspect real score distributions.

Watch Distance vs Similarity Direction

Distance and similarity scores move in opposite directions.

Distance: lower is usually closer.
Similarity: higher is usually closer.

Before sorting, filtering, or displaying scores, confirm which type of value the database returns.

Account for Vector Normalization

Vector normalization can change metric behavior.

When vectors are normalized to unit length, cosine and dot-product comparisons can become closely related. Without normalization, vector magnitude can change rankings.

If model documentation says to normalize vectors, follow that guidance consistently during both indexing and querying.

Consider Performance, But Do Not Lead With It

Distance calculations affect performance. Some metrics are faster in a given database or hardware environment than others.

Performance matters, especially at scale, but it should not be the only decision. A fast metric that retrieves poor results can increase downstream cost through reranking, larger prompts, failed answers, or user frustration.

Start with relevance, then tune for speed.

Decision Checklist

Use this checklist:

Read the embedding model documentation.
Use the recommended metric as the starting point.
Confirm whether vectors should be normalized.
Index and query with the same assumptions.
Evaluate with real queries and expected results.
Test filtered search, not only open search.
Choose thresholds from real score distributions.
Monitor retrieval quality after deployment.

Common Mistakes

Common mistakes include:

choosing cosine only because it is common
using L2 with a model trained for cosine without testing
mixing normalized and unnormalized vectors
sorting distance scores in the wrong direction
copying thresholds from a different model
evaluating unfiltered search while deploying filtered search
changing the metric without rebuilding evaluation baselines

Summary

Choose a distance metric for embeddings by matching the embedding model first, then validating with real retrieval examples.

Cosine, dot product, L2, Manhattan, and Hamming all measure different kinds of closeness. The right metric depends on the model, vector normalization, data, filters, and application quality requirements.

For production vector search, metric choice should be treated as a relevance decision, not just a database configuration detail.