What Is L2 Distance?

L2 distance is a way to measure how far apart two vectors are. In vector search, it helps a database decide which stored vectors are closest to a query vector.

If two vectors are close by L2 distance, they are considered more similar under that distance metric. If they are far apart, they are considered less similar.

L2 distance is also called Euclidean distance. It is the straight-line distance between two points in space.

Simple Definition

L2 distance measures the direct distance between two vectors.

Imagine two points on a map. The L2 distance is the straight line from one point to the other. In vector search, the same idea applies, but the points may exist in hundreds or thousands of dimensions instead of two dimensions.

A smaller L2 distance means the vectors are closer. A larger L2 distance means the vectors are farther apart.

Why L2 Distance Matters in Vector Search

Vector databases store embeddings. An embedding is a vector that represents the meaning or features of an object.

When a user searches, the query is also converted into a vector. The database compares the query vector against stored vectors and ranks results by distance.

With L2 distance:

distance of 0 means the vectors are identical
a small distance means the vectors are close
a large distance means the vectors are far apart
lower distance usually means higher similarity

This makes L2 distance one possible mathematical foundation for semantic search, similarity search, recommendations, clustering, and retrieval systems.

A Tiny Example

Suppose we have two simple vectors:

A = [2, 3]

B = [5, 7]

On a two-dimensional graph, vector A is the point at 2, 3, and vector B is the point at 5, 7.

The difference between them is:

first dimension: 5 - 2 = 3
second dimension: 7 - 3 = 4

The straight-line distance is 5. This is the same 3-4-5 triangle many people learn in geometry.

In real embedding systems, the vectors may have 384, 768, 1536, or more dimensions. The idea is the same, but the calculation uses many more coordinates.

L2 Distance in Plain English

L2 distance asks: how much would you need to move from one vector to reach the other vector?

If the movement is small, the vectors are close. If the movement is large, the vectors are far apart.

For embeddings, closeness usually means the model considers the two items similar in the way it was trained to represent them.

L2 Distance vs Similarity

L2 is a distance metric, not a similarity score.

That means lower is better when searching for similar items. This can feel opposite to similarity scores, where higher is usually better.

For example:

distance = 0.1 is closer than distance = 2.0
distance = 0 means the vectors are identical
a higher distance means the vectors are less similar under L2

When reading vector search results, it is important to know whether the database is returning a distance or a similarity score.

L2 Distance vs Squared L2 Distance

Many vector systems use squared L2 distance instead of ordinary L2 distance.

Squared L2 distance adds up the squared differences between vector components but does not take the square root at the end. This can be faster to compute and is often enough for ranking, because the order of nearest results stays the same.

For example, if one candidate is closer than another by L2 distance, it will also be closer by squared L2 distance.

The next article in this series explains the L2 formula and squared L2 formula in more detail.

When L2 Distance Is Useful

L2 distance is useful when the embedding model and index were designed to use Euclidean-style distance.

It is common in machine learning, clustering, nearest-neighbor search, and some vector search systems. It can work well when the magnitude and position of vectors matter.

However, L2 is not always the right metric. The best metric depends on how the embedding model was trained and how its vectors are meant to be compared.

L2 Distance vs Cosine Similarity

Cosine similarity compares the angle between vectors. L2 distance compares their straight-line distance.

This distinction matters because two vectors can point in a similar direction but have different lengths. Cosine focuses more on direction. L2 pays attention to the actual coordinate distance.

For many text embedding models, cosine similarity or cosine distance is common. For other models or datasets, L2 may be appropriate.

The practical rule is simple: use the distance metric recommended for the embedding model or the metric used during training and evaluation.

L2 Distance vs Dot Product

Dot product measures alignment and magnitude in a different way. It is often treated as a similarity-style metric, while L2 is a distance metric.

With L2, smaller values mean closer vectors. With dot product-style scoring, interpretation depends on the database and whether it returns a similarity score or a transformed distance value.

This is why distance metrics should not be swapped casually. Changing the metric can change result ranking.

How L2 Distance Affects Search Results

The distance metric affects which results appear near the top.

If a vector database uses L2 distance, it will prefer objects whose embeddings are closest to the query embedding by straight-line distance. A different metric, such as cosine distance or dot product, may rank the same objects differently.

This affects:

semantic search relevance
RAG context retrieval
recommendation results
clustering behavior
duplicate or near-duplicate detection

For production retrieval, metric choice should be tested with real queries and expected results.

Common Mistakes

Common mistakes with L2 distance include:

assuming all embedding models should use L2
comparing L2 scores as if higher means better
mixing vectors from different embedding models
switching distance metrics without re-evaluating search quality
confusing ordinary L2 distance with squared L2 distance

These mistakes can make retrieval quality worse even if the database is working correctly.

Summary

L2 distance measures the straight-line distance between two vectors. In vector search, it helps rank stored vectors by how close they are to a query vector.

Lower L2 distance means the vectors are closer and usually more similar under that metric. A distance of zero means the vectors are identical.

L2 is one of several distance metrics used in vector databases. It should be chosen when it matches the embedding model and retrieval behavior the application needs.