L1 Distance vs L2 Distance

L1 distance and L2 distance are two ways to measure how far apart vectors are. Both can be used in machine learning and vector search, but they behave differently.

L1 distance is also called Manhattan distance or taxicab distance. L2 distance is also called Euclidean distance. The main difference is that L1 adds absolute differences, while L2 squares differences and takes a square root.

Short Answer

The difference is:

  • L1 distance: add the absolute differences between vector components
  • L2 distance: square the differences, add them, then take the square root

Both are distance metrics, so lower values mean closer vectors.

L1 Distance Formula

For two vectors A and B, L1 distance is:

L1(A, B) = |A1 - B1| + |A2 - B2| + ... + |An - Bn|

It adds the absolute difference in each dimension.

This is called Manhattan distance because it resembles movement through a city grid. You move along streets, not directly through buildings.

L2 Distance Formula

For two vectors A and B, L2 distance is:

L2(A, B) = sqrt((A1 - B1)^2 + (A2 - B2)^2 + ... + (An - Bn)^2)

It measures straight-line distance between two vector points.

This is called Euclidean distance because it comes from ordinary geometric distance.

Simple Example

Use two vectors:

A = [2, 3]

B = [5, 7]

The component differences are:

5 - 2 = 3
7 - 3 = 4

L1 distance adds the absolute differences:

|3| + |4| = 7

L2 distance uses the straight-line formula:

sqrt(3^2 + 4^2) = sqrt(25) = 5

So for this example:

  • L1 distance is 7
  • L2 distance is 5

How They Behave Differently

L1 distance treats each dimension’s difference as a simple amount to add.

L2 distance squares differences, so larger differences have more influence. A single large gap in one dimension can affect L2 more strongly than it affects L1.

For example, a difference of 10 contributes 10 to L1, but contributes 100 before the square root step in L2.

Grid Distance vs Straight-Line Distance

A helpful mental model is city travel.

L1 distance is like walking through a street grid. You move horizontally and vertically, then add the travel distance.

L2 distance is like drawing a direct straight line between two points.

In two dimensions, L2 is usually shorter or equal to L1 because the straight line is the most direct path.

In Vector Search

In vector search, both L1 and L2 can compare a query vector with stored vectors.

If the database uses L1, it ranks candidates by the sum of absolute component differences.

If the database uses L2, it ranks candidates by Euclidean-style distance.

Both are distance metrics:

  • lower L1 means closer under L1
  • lower L2 means closer under L2
  • a distance of 0 means identical vectors

Which One Is Better?

Neither is universally better.

The right metric depends on the embedding model, data distribution, and retrieval task. Some models are trained or evaluated with one metric in mind. In that case, the best default is to use the metric expected by the model.

If the model documentation recommends cosine, L2, dot product, or another metric, follow that guidance and verify it with real search examples.

When L1 Distance Can Be Useful

L1 distance can be useful when:

  • you want a grid-like sum of component differences
  • you want a metric less dominated by large squared differences
  • your model or task was designed around Manhattan-style distance
  • you are comparing sparse or feature-engineered vectors

L1 is common in some machine learning contexts, but it is less common than cosine, dot product, or L2 in many modern text embedding search systems.

When L2 Distance Can Be Useful

L2 distance can be useful when:

  • straight-line coordinate distance matters
  • vector magnitude carries useful information
  • the embedding model was trained or evaluated with Euclidean distance
  • the vector index is designed for L2 or squared L2 search

L2 is a familiar nearest-neighbor metric and appears often in vector search and clustering.

Why L2 Can Emphasize Large Differences

L2 squares component differences. This gives larger gaps more weight.

Suppose one dimension differs by 1 and another differs by 10.

  • In L1, those contribute 1 and 10.
  • In L2 before the square root, they contribute 1 and 100.

This can be useful or harmful depending on the data. It is one reason metric choice should be evaluated, not guessed.

L1 vs L2 vs Cosine

L1 and L2 both measure coordinate distance. Cosine measures angle or direction.

This means cosine may treat two vectors as similar if they point in the same direction, even if their magnitudes differ. L1 and L2 are more affected by coordinate differences.

For text embeddings, cosine is often common. For other models or tasks, L1 or L2 may be appropriate.

Common Mistakes

Common mistakes include:

  • assuming L1 and L2 produce the same ranking
  • choosing a metric without checking the embedding model
  • forgetting that lower distance means closer
  • comparing raw L1 and L2 scores as if they are on the same scale
  • using a threshold from one metric with another metric
  • switching metrics without re-testing retrieval quality

Practical Rule

Use the distance metric that matches the embedding model and the retrieval task.

If you are unsure, run a small relevance evaluation with real queries. Compare whether L1, L2, cosine, or another metric retrieves the expected results.

Do not choose the metric only because the formula is familiar.

Summary

L1 distance adds absolute component differences. L2 distance measures Euclidean straight-line distance by squaring differences, adding them, and taking the square root.

L1 is Manhattan-style distance. L2 is Euclidean-style distance. Both are distance metrics, so lower values mean closer vectors.

In vector search, the best choice depends on the embedding model, data, and relevance evaluation.