๐ L1 vs L2 Loss: Understanding Error in Machine Learning
When we build a machine learning model, we are essentially asking one question:
“How wrong is my model?”
This question is answered using something called a loss function. It measures the difference between what the model predicts and what actually happens.
Two of the most commonly used loss functions are L1 Loss and L2 Loss. At first glance, they seem similar — but they behave very differently in practice.
๐ Table of Contents
- Why Loss Functions Matter
- Understanding L1 Loss
- Understanding L2 Loss
- Core Difference
- Code Example
- CLI Output
- Key Takeaways
๐ง Why Loss Functions Matter
Imagine you are predicting how many apples will be harvested. Your model makes guesses, but those guesses are not always correct.
The loss function acts like a scorekeeper. It tells you how far off your predictions are — and more importantly — how the model should improve.
๐ Deep Insight
A model doesn’t learn directly from data — it learns by minimizing loss. Different loss functions guide the model in different directions.
๐ L1 Loss — Measuring Absolute Error
L1 loss, also known as Mean Absolute Error, measures how far predictions are from actual values using simple distance.
It does not care whether the error is positive or negative — it only cares about how big the mistake is.
Let’s go back to the apple example.
Actual harvest is 100 apples. If your prediction is 90, the error is 10.
If you make multiple predictions, you simply average all these differences.
๐ Step-by-Step Example
Prediction errors:
90 → error 10
95 → error 5
105 → error 5
Average error = (10 + 5 + 5) / 3 = 6.67
What makes L1 interesting is its fairness — every error is treated equally.
A mistake of 50 is not dramatically worse than a mistake of 10. This makes L1 loss resistant to extreme values.
๐ L2 Loss — Squaring the Error
L2 loss, also known as Mean Squared Error, takes a different approach.
Instead of just measuring the difference, it squares the error.
This small change has a big impact.
Errors grow rapidly when squared:
10 becomes 100 5 becomes 25
๐ Step-by-Step Example
Prediction errors:
90 → error 10 → squared 100
95 → error 5 → squared 25
105 → error 5 → squared 25
Average = (100 + 25 + 25) / 3 = 50
This means L2 loss strongly punishes large mistakes.
Even one big error can dominate the total loss.
⚖️ The Real Difference (Intuition)
The difference between L1 and L2 is not just mathematical — it reflects a philosophy.
L1 says: “Every mistake matters equally.”
L2 says: “Big mistakes are much worse than small ones.”
So the choice depends on what kind of mistakes you care about.
๐ When to Use Each
Use L1 when your data contains outliers and you want stability. Use L2 when large errors are unacceptable and must be minimized aggressively.
๐งฎ The Math Behind L1 and L2 Loss (Made Simple)
Now that we understand the intuition, let’s briefly look at the mathematical form — without making it complicated.
Don’t worry — the goal here is not to memorize formulas, but to understand what they are doing.
๐ L1 Loss Formula
L1 Loss = (1/n) * ฮฃ |y_actual - y_predicted|
Here’s what this means in simple terms:
- Take the difference between actual and predicted values - Convert it into a positive number (absolute value) - Add all errors together - Divide by total number of data points
So L1 is basically calculating the average distance between prediction and reality.
๐ Intuition
Think of this like measuring how far you missed a target, without caring if you missed left or right. Only the distance matters.
๐ L2 Loss Formula
L2 Loss = (1/n) * ฮฃ (y_actual - y_predicted)²
This looks similar, but there’s one key difference — the error is squared.
So instead of just measuring distance, we:
- Calculate the difference - Square it (multiply it by itself) - Add all squared errors - Take the average
This squaring step is what changes everything.
๐ Intuition
Squaring makes big errors much bigger.
For example:
Error of 2 → becomes 4
Error of 10 → becomes 100
So large mistakes dominate the loss.
⚖️ Why This Difference Matters Mathematically
Mathematically, L1 grows in a straight line, while L2 grows faster as errors increase.
This means:
- L1 treats all errors evenly - L2 increasingly punishes larger errors
That’s why L2 is sensitive to outliers, while L1 is more stable.
๐ One-Line Summary
L1 = linear penalty L2 = exponential-like penalty (due to squaring)
๐ป Code Example
import numpy as np
y_true = np.array([100, 100, 100])
y_pred = np.array([90, 95, 105])
# L1 Loss (MAE)
l1 = np.mean(np.abs(y_true - y_pred))
# L2 Loss (MSE)
l2 = np.mean((y_true - y_pred)**2)
print("L1 Loss:", l1)
print("L2 Loss:", l2)
This code shows how both loss functions compute error differently using the same predictions.
๐ฅ️ CLI Output Example
Calculating Loss... L1 Loss: 6.67 L2 Loss: 50.0 Observation: L2 is much larger because it penalizes bigger errors more heavily
๐ก Key Takeaways
L1 loss gives a balanced, stable view of error and is less affected by extreme values. L2 loss magnifies large errors, making it useful when big mistakes must be avoided.
Neither is universally better — the right choice depends on the nature of your data and the cost of errors in your problem.
๐ Related Articles
- TPR vs FPR in Machine Learning
- Softmax vs Probability
- Beyond Accuracy
- Entropy in Machine Learning
- Linear Regression Explained
๐ Final Thought
A loss function is more than a formula — it defines what your model considers “important.” Choose it carefully, because it directly shapes how your model learns from mistakes.
No comments:
Post a Comment