Choose a distance metric for embeddings by starting with the metric the embedding model was trained or evaluated to use. Then verify that choice with real queries, expected results, filters, and the retrieval quality your application needs.
There is no single best distance metric for every embedding system. Cosine, dot product, L2, squared L2, Manhattan, and Hamming measure different kinds of closeness. The best choice depends on the model, data, and use case.
Start With the Embedding Model
The safest first rule is simple: use the distance metric recommended by the embedding model.
Embedding models are trained with assumptions about how vectors will be compared. If a model was trained with cosine similarity, cosine-style search is usually the right starting point. If a model was trained with Euclidean distance, L2 or squared L2 may be appropriate. If a model expects dot product, use dot-product-style scoring.
The distance metric is not only a database option. It is part of the embedding system.
Check the Model Documentation
Before choosing a metric, check the model documentation for:
- recommended similarity or distance metric
- whether vectors should be normalized
- whether dot product, cosine, or L2 was used in training
- whether the model supports multiple metrics
- benchmark examples or retrieval guidance
If the provider gives a clear recommendation, use that as the default unless your own evaluation proves another metric works better.
Understand the Main Options
Different metrics answer different questions.
Cosine asks: do the vectors point in a similar direction?
Dot product asks: how strongly do the vectors align, including magnitude?
L2 asks: how far apart are the vectors in coordinate space?
Manhattan asks: what is the sum of absolute coordinate differences?
Hamming asks: how many positions differ?
These definitions produce different rankings, so metric choice directly affects search results.
When Cosine Is a Good Starting Point
Cosine similarity or cosine distance is often a good starting point for text embeddings when the model documentation recommends it.
Cosine focuses on direction rather than raw vector length. This can work well when documents should match by meaning even if their vector magnitudes differ.
Use cosine when:
- the model recommends cosine similarity
- direction matters more than magnitude
- vectors are normalized or intended for angular comparison
- you are building semantic text search with a model designed for cosine
When Dot Product May Fit
Dot product can be useful when both direction and magnitude matter.
Some embedding models are trained for dot-product retrieval. In that case, dot product may be better than cosine or L2.
Use dot product when:
- the model recommends dot product
- vector magnitude contains useful signal
- the retrieval benchmark for the model uses dot product
- your evaluation shows better ranking than cosine or L2
Be careful with score interpretation. Some systems expose dot product as a similarity-style score, while others transform it into a distance-style value.
When L2 or Squared L2 May Fit
L2 distance, also called Euclidean distance, measures coordinate distance between vectors. Squared L2 uses the same differences but skips the final square root.
Use L2 or squared L2 when:
- the model was trained or evaluated with Euclidean distance
- coordinate distance matters for the representation
- the index is optimized for L2-style retrieval
- your evaluation shows better results than cosine or dot product
Remember that L2 is a distance: lower values mean closer vectors.
When Manhattan Distance May Fit
Manhattan distance, or L1 distance, adds absolute coordinate differences.
It may be useful for some feature-engineered, sparse, or non-standard vector representations. It is less commonly the default for modern dense text embeddings unless the model or task supports it.
Use Manhattan only when the data and evaluation justify it.
When Hamming Distance May Fit
Hamming distance counts positions that differ.
It is useful for binary, categorical, or discrete vector representations. It is usually not the right choice for ordinary dense floating-point embeddings.
Use Hamming when the vector representation is designed for position-wise difference counting.
Do Not Choose by Popularity Alone
Cosine is common, but common does not mean correct for every model. L2 is familiar, but familiar does not mean correct for every embedding space. Dot product can be powerful, but only when the model and score interpretation support it.
The metric should be chosen because it matches the model and improves retrieval quality, not because it appears often in examples.
Evaluate With Real Queries
After choosing a starting metric, evaluate it.
Create a small benchmark with:
- real user queries
- known relevant documents
- hard negative examples
- short and long queries
- domain-specific terms
- filtered and unfiltered search cases
- queries that should return no strong match
Then compare metrics using the same data and same retrieval pipeline.
Measure More Than Top-1
Do not judge only the first result.
Useful evaluation measures include:
- recall at k
- precision at k
- MRR
- nDCG
- RAG answer quality
- manual relevance review
For RAG, the best metric is the one that retrieves the context needed to answer correctly, not only the one that produces a nice-looking nearest-neighbor score.
Test With Filters
Many production searches are filtered.
For example:
- only this tenant’s data
- only documents the user can access
- only one language
- only recent content
- only a product category
A metric may perform well on open search but worse after filters reduce the candidate set. Test the metric under realistic filters.
Set Thresholds After Choosing the Metric
Distance and similarity thresholds are metric-specific.
A cosine threshold cannot be reused as an L2 threshold. An ordinary L2 threshold cannot be reused as a squared L2 threshold. A threshold from one embedding model may not work for another model.
Set thresholds after you choose the metric and after you inspect real score distributions.
Watch Distance vs Similarity Direction
Distance and similarity scores move in opposite directions.
- Distance: lower is usually closer.
- Similarity: higher is usually closer.
Before sorting, filtering, or displaying scores, confirm which type of value the database returns.
Account for Vector Normalization
Vector normalization can change metric behavior.
When vectors are normalized to unit length, cosine and dot-product comparisons can become closely related. Without normalization, vector magnitude can change rankings.
If model documentation says to normalize vectors, follow that guidance consistently during both indexing and querying.
Consider Performance, But Do Not Lead With It
Distance calculations affect performance. Some metrics are faster in a given database or hardware environment than others.
Performance matters, especially at scale, but it should not be the only decision. A fast metric that retrieves poor results can increase downstream cost through reranking, larger prompts, failed answers, or user frustration.
Start with relevance, then tune for speed.
Decision Checklist
Use this checklist:
- Read the embedding model documentation.
- Use the recommended metric as the starting point.
- Confirm whether vectors should be normalized.
- Index and query with the same assumptions.
- Evaluate with real queries and expected results.
- Test filtered search, not only open search.
- Choose thresholds from real score distributions.
- Monitor retrieval quality after deployment.
Common Mistakes
Common mistakes include:
- choosing cosine only because it is common
- using L2 with a model trained for cosine without testing
- mixing normalized and unnormalized vectors
- sorting distance scores in the wrong direction
- copying thresholds from a different model
- evaluating unfiltered search while deploying filtered search
- changing the metric without rebuilding evaluation baselines
Summary
Choose a distance metric for embeddings by matching the embedding model first, then validating with real retrieval examples.
Cosine, dot product, L2, Manhattan, and Hamming all measure different kinds of closeness. The right metric depends on the model, vector normalization, data, filters, and application quality requirements.
For production vector search, metric choice should be treated as a relevance decision, not just a database configuration detail.