L2 Similarity vs L2 Distance

L2 distance is a distance measure, not a similarity score. That distinction matters when reading vector search results.

With L2 distance, lower values mean vectors are closer. If someone says “L2 similarity,” they usually mean a similarity score derived from L2 distance, not the raw L2 distance itself.

Short Answer

The difference is:

  • L2 distance: a raw distance value where lower means closer
  • L2 similarity: a transformed score where higher is usually intended to mean more similar

Most vector databases naturally calculate and return L2 as a distance. Similarity requires an extra conversion or normalization step.

What L2 Distance Means

L2 distance measures the straight-line distance between two vectors.

For two vectors A and B, ordinary L2 distance is:

sqrt(sum((Ai - Bi)^2))

Many vector databases use squared L2 distance:

sum((Ai - Bi)^2)

In both cases, the interpretation is the same:

  • 0 means identical vectors
  • smaller values mean closer vectors
  • larger values mean farther vectors

Why L2 Is Not Naturally a Similarity Score

A similarity score usually feels like “higher is better.” L2 distance works the other way around.

Example:

  • Vector A distance: 0.4
  • Vector B distance: 2.1
  • Vector C distance: 9.8

Vector A is the closest result because 0.4 is the lowest distance.

If you sort those values as if higher means better, you would rank the worst result first.

What People Mean by L2 Similarity

When people say “L2 similarity,” they may mean one of several things:

  • the inverse of L2 distance
  • a normalized distance converted into a score
  • a custom score such as 1 / (1 + distance)
  • a product-specific similarity value based on Euclidean distance

These are not the same as raw L2 distance. They are derived values.

Example Conversion

A simple way to turn distance into a similarity-like score is:

similarity = 1 / (1 + distance)

If the distance is 0:

1 / (1 + 0) = 1

If the distance is 4:

1 / (1 + 4) = 0.2

This makes closer vectors produce higher scores. But this is only one possible conversion. It is not a universal standard.

Why Conversions Are Not Universal

Raw L2 distances depend on the embedding model, vector dimensions, normalization, data distribution, and whether the score is ordinary L2 or squared L2.

Because of that, a distance of 2.0 can mean different things in different systems.

A conversion that works for one model may not work for another. Thresholds and similarity mappings should be created from real validation examples, not copied blindly.

L2 Distance vs Squared L2 Distance

Squared L2 distance is ordinary L2 distance without the final square root.

Example:

  • ordinary L2 distance: 5
  • squared L2 distance: 25

The ranking order is usually the same, but the scale is different. This matters if you are converting distance into a similarity score.

A conversion designed for ordinary L2 should not automatically be applied to squared L2 without testing.

L2 Distance vs Cosine Similarity

Cosine similarity is more naturally discussed as a similarity score because higher cosine similarity means vectors point in a more similar direction.

Cosine distance is the distance-style version, often expressed as:

1 - cosine similarity

So with cosine:

  • cosine similarity: higher is more similar
  • cosine distance: lower is more similar

L2 starts as a distance measure. If you need a similarity score, you must define how to convert it.

How to Read Search Results

When looking at vector search results, first identify what the database is returning.

Ask:

  • Is this field a distance or a similarity score?
  • Does lower mean better or higher mean better?
  • Is the value ordinary L2 or squared L2?
  • Was the embedding model trained for L2, cosine, or dot product?
  • Are thresholds based on real examples?

This avoids many ranking and filtering mistakes.

Common Mistakes

Common mistakes include:

  • sorting L2 distance in descending order
  • calling raw L2 distance a similarity score
  • treating L2 values as percentages
  • using the same threshold across different embedding models
  • mixing ordinary L2 and squared L2 in one scoring rule
  • comparing L2 scores directly with cosine similarity scores

When a Similarity Conversion Is Useful

A similarity conversion can be useful when:

  • the user interface expects higher-is-better scores
  • multiple retrieval systems need a common display format
  • you need to combine L2-based retrieval with other scoring signals
  • non-technical users need easier score interpretation

Even then, the conversion should be documented. The system should keep the raw distance available for debugging and evaluation.

Practical Rule

Use this rule:

  • If the value is L2 distance, sort ascending.
  • If the value is squared L2 distance, sort ascending.
  • If you need a similarity score, define and test a conversion.
  • Do not compare L2 distance directly with cosine similarity.

Summary

L2 distance is a raw distance value where lower means closer. L2 similarity is usually a derived score that converts distance into a higher-is-better value.

The distinction matters because sorting, thresholds, user-facing scores, and retrieval evaluation can all break if distance is treated like similarity.

For vector search, keep the raw meaning clear: L2 is naturally a distance metric. Smaller values mean more similar under that metric.