Cosine Similarity vs L2 Distance

Cosine similarity and L2 distance are two common ways to compare embeddings, but they do not measure the same thing. Cosine compares vector direction. L2 distance compares coordinate distance.

That difference matters because a vector search system ranks results by the metric you choose. The same query and the same stored vectors can produce different result orders under cosine similarity and L2 distance.

Short Answer

Use cosine similarity when the angle or direction of the vectors is the main signal. Use L2 distance when the straight-line distance between vector coordinates is the main signal.

For most embedding systems, the best starting point is still the metric recommended by the embedding model. If the model was trained or evaluated with cosine, start with cosine. If it was trained or evaluated with Euclidean distance, start with L2 or squared L2.

What Cosine Similarity Measures

Cosine similarity measures the angle between two vectors.

If two vectors point in the same direction, they are considered similar, even if one vector has a larger magnitude than the other. This is why cosine is common in text and semantic search systems: two pieces of text may express similar meaning even if their vector lengths differ.

cosine_similarity(a, b) = dot(a, b) / (||a|| * ||b||)

Cosine similarity is usually interpreted as a similarity score: higher means more similar.

Some vector databases expose cosine distance instead:

cosine_distance(a, b) = 1 - cosine_similarity(a, b)

Cosine distance is interpreted as a distance: lower means more similar.

What L2 Distance Measures

L2 distance, also called Euclidean distance, measures straight-line distance between two vectors in coordinate space.

It looks at how far apart the vector components are, dimension by dimension.

L2(a, b) = sqrt(sum((a_i - b_i)^2))

Some systems use squared L2 distance:

squared_L2(a, b) = sum((a_i - b_i)^2)

Squared L2 keeps the same nearest-neighbor ordering as ordinary L2 for a fixed query, but the numeric values are different because the square root is omitted.

L2 is a distance score: lower means more similar.

The Main Difference

The main difference is that cosine cares about direction, while L2 cares about position.

Imagine two vectors that point in nearly the same direction, but one is much longer than the other. Cosine may treat them as very similar because the angle is small. L2 may treat them as farther apart because their coordinates are not close.

Now imagine two vectors with similar coordinate distance but different directions. L2 may rank them as close, while cosine may consider them less similar.

Simple Example

Consider these two vectors:

a = [1, 1]
b = [10, 10]

They point in the same direction. Their cosine similarity is very high because the angle between them is zero.

But their L2 distance is large because the coordinate values are far apart.

This is the core difference: cosine ignores much of the magnitude effect, while L2 keeps it.

Why This Matters for Embeddings

Embedding values are not ordinary user-facing features. They are learned representations created by a model.

That means the metric should match how the model expects vectors to be compared. If the model learned a space where direction carries the semantic meaning, cosine may work well. If the model learned a space where coordinate distance carries the meaning, L2 may work better.

Changing the metric can change what the system considers relevant.

Cosine Similarity in Semantic Search

Cosine similarity is often used for text embeddings and semantic search because it compares orientation in the embedding space.

This can be useful when documents and queries should match by meaning rather than by vector length. For example, a short question and a longer document passage might still point in a similar semantic direction.

Cosine is a strong default when:

  • the embedding model recommends cosine
  • vectors are normalized or expected to be compared by angle
  • meaning is represented mostly by direction
  • magnitude is not intended to change ranking much

L2 Distance in Vector Search

L2 distance is useful when actual coordinate distance matters.

Some models, feature spaces, and indexing methods are designed around Euclidean distance. In those cases, L2 can be the correct metric even if cosine is more common in text search examples.

L2 is a strong default when:

  • the embedding model recommends Euclidean or L2 distance
  • the vector space was trained with an L2-style loss
  • coordinate distance has meaningful interpretation
  • evaluation shows stronger recall or ranking than cosine

Normalized Vectors Change the Comparison

Normalization makes every vector have the same length, usually length 1.

When vectors are normalized, cosine similarity and L2 distance become closely related. They may produce the same or very similar rankings because magnitude no longer changes the comparison.

Without normalization, cosine and L2 can behave very differently.

This is why you should always know whether your embedding pipeline normalizes vectors before indexing and querying.

Score Direction Can Be Confusing

Cosine similarity and L2 distance use opposite intuitions.

  • Cosine similarity: higher is closer.
  • Cosine distance: lower is closer.
  • L2 distance: lower is closer.
  • Squared L2 distance: lower is closer.

Before sorting or applying thresholds, confirm whether your database returns a similarity score or a distance score.

Thresholds Are Not Interchangeable

A cosine threshold cannot be reused as an L2 threshold.

Cosine distance often has a bounded range. L2 distance can grow with vector scale and dimensionality. Squared L2 produces larger numeric values than L2 for the same pair of vectors.

If you switch from cosine to L2, rebuild your thresholds from real score distributions.

How Rankings Can Change

Cosine may prefer results that point in the same semantic direction as the query, even if the vector magnitudes differ.

L2 may prefer results that are closest in coordinate space, even if another vector has a more similar angle.

In a RAG system, that can change which chunks are retrieved, which context enters the prompt, and which answer the model produces.

Which One Is Better?

Neither metric is universally better.

Cosine similarity is often a good fit for many text embedding systems. L2 distance is a good fit when the model or representation was designed around Euclidean geometry.

The correct question is not, “Is cosine better than L2?” The correct question is, “Which metric matches this embedding model and retrieves the best results for this application?”

Practical Evaluation

To compare cosine and L2 fairly, use the same dataset, queries, filters, and index settings where possible.

Evaluate:

  • top-k result relevance
  • recall at k
  • ranking order for hard queries
  • behavior on short and long queries
  • filtered search behavior
  • RAG answer quality
  • threshold stability

If one metric performs better on real user tasks, that result is more important than a general rule.

Common Mistakes

Common mistakes include:

  • using cosine because it is popular, without checking the model
  • using L2 because it feels geometrically intuitive, without testing retrieval quality
  • forgetting that cosine similarity and cosine distance sort in opposite directions
  • mixing normalized and unnormalized vectors
  • copying thresholds from cosine to L2
  • assuming squared L2 scores have the same numeric scale as L2 scores

Summary

Cosine similarity compares vector direction. L2 distance compares coordinate distance.

Cosine is often useful for semantic text embeddings because it focuses on angular similarity. L2 is useful when Euclidean distance is the metric the model or vector space expects.

For production search, start with the embedding model’s recommended metric, then validate the choice with real queries and retrieval quality tests.