The L2 distance metric is a way to compare two vectors by measuring the straight-line distance between them. In vector search, it helps decide which stored vectors are closest to a query vector.
L2 distance is also known as Euclidean distance. When a vector database uses an L2-style metric, lower distance values mean closer vectors and usually more similar results under that metric.
What Is a Distance Metric?
A distance metric is a rule for measuring how far apart two vectors are.
Vector search depends on distance metrics because embeddings are lists of numbers. To rank results, the system needs a mathematical way to compare the query vector with stored vectors.
Common distance metrics include:
- L2 or Euclidean distance
- squared L2 distance
- cosine distance
- dot product based scoring
- Manhattan distance
- Hamming distance
Each metric compares vectors differently, so the choice can change search results.
What the L2 Distance Metric Measures
The L2 distance metric measures coordinate distance.
If vectors are treated as points in space, L2 distance is the direct line from one point to another. In two dimensions, this is easy to draw. In embedding systems, the same idea applies to hundreds or thousands of dimensions.
A smaller L2 distance means the points are closer together. A larger L2 distance means they are farther apart.
Simple Example
Suppose two vectors are:
A = [2, 3]
B = [5, 7]
The distance between them is found by comparing matching components:
5 - 2 = 3
7 - 3 = 4
Then the straight-line distance is:
sqrt(3^2 + 4^2) = sqrt(25) = 5
The L2 distance is 5.
How to Interpret L2 Distance Scores
L2 is a distance metric, so lower values are closer.
0means identical vectors0.5is closer than2.02.0is closer than10.0- higher distance means lower similarity under this metric
Do not read L2 distance like a confidence percentage. A distance of 0.8 does not mean 80 percent similar. It is a raw distance value in the vector space.
L2 vs Squared L2
Many vector databases expose squared L2 instead of ordinary L2.
Squared L2 uses this calculation:
sum((Ai - Bi)^2)
Ordinary L2 takes the square root at the end:
sqrt(sum((Ai - Bi)^2))
Squared L2 is often faster because it avoids the square root operation. For nearest-neighbor ranking, squared L2 usually preserves the same order as ordinary L2.
If one vector is closest by L2, it will also be closest by squared L2.
Why Metric Choice Matters
The distance metric affects ranking.
The same query and dataset can produce different result orders depending on whether the system uses L2 distance, cosine distance, or dot product based scoring.
This matters for:
- semantic search
- RAG retrieval
- recommendations
- clustering
- duplicate detection
- image or multimodal search
The best metric is usually the one expected by the embedding model or proven by relevance testing.
L2 Distance vs Cosine Distance
L2 distance measures straight-line coordinate distance. Cosine distance measures angle or direction.
This means L2 can be affected by vector magnitude. Cosine focuses more on whether vectors point in a similar direction.
For many text embedding models, cosine distance is common. For other models, L2 or squared L2 may be appropriate.
Neither metric is universally better. The right metric depends on the embedding model and the retrieval task.
L2 Distance vs Dot Product
Dot product based scoring measures alignment and magnitude in a different way. In many systems, dot product behaves more like a similarity score, while L2 behaves like a distance.
This can make score interpretation confusing. With L2, smaller is closer. With dot product style scoring, higher or lower may depend on how the database exposes the score.
Always check whether the returned number is a distance or a similarity value.
When to Use the L2 Distance Metric
L2 can be a good choice when:
- the embedding model was trained or evaluated with Euclidean distance
- vector magnitude carries useful information
- the application expects coordinate-distance behavior
- the index supports efficient L2 or squared L2 search
- relevance tests show better results with L2 than with alternatives
Do not choose L2 only because it is familiar from geometry. Choose it because it matches the data and model.
How L2 Affects Vector Database Configuration
In many vector databases, the distance metric is part of the collection or index configuration.
That choice can affect:
- how vectors are indexed
- how query candidates are ranked
- what distance values are returned
- whether score thresholds are meaningful
- whether results match model expectations
Changing the metric after data is indexed may require rebuilding or re-evaluating the index. It should be treated as a search-quality decision, not just a syntax option.
Using L2 Thresholds
Some systems allow a distance threshold, such as “return only results closer than this value.”
With L2, threshold values are model- and data-dependent. A good threshold for one embedding model may be meaningless for another. A threshold for ordinary L2 is also not the same as a threshold for squared L2.
Set thresholds using real examples, not guesses.
Common Mistakes
Common mistakes include:
- assuming higher L2 distance means better similarity
- mixing L2 and squared L2 scores
- using L2 with a model designed for cosine similarity without testing
- comparing raw L2 distances across different embedding models
- treating L2 scores as percentages
- changing the metric without rebuilding evaluation baselines
Summary
The L2 distance metric measures straight-line distance between two vectors. In vector search, lower L2 distance means the stored vector is closer to the query vector under that metric.
Many systems use squared L2 for efficiency, which keeps nearest-neighbor ranking but changes the scale of the returned value.
L2 is useful when it matches the embedding model and retrieval task. It should be chosen through model guidance and relevance testing, not treated as interchangeable with cosine distance or dot product.