Showing posts with label mean absolute error. Show all posts
Showing posts with label mean absolute error. Show all posts

Tuesday, October 8, 2024

Difference Between L1 Loss and L2 Loss in Machine Learning


L1 vs L2 Loss Explained Simply | Machine Learning Guide

๐Ÿ“‰ L1 vs L2 Loss: Understanding Error in Machine Learning

When we build a machine learning model, we are essentially asking one question:

“How wrong is my model?”

This question is answered using something called a loss function. It measures the difference between what the model predicts and what actually happens.

Two of the most commonly used loss functions are L1 Loss and L2 Loss. At first glance, they seem similar — but they behave very differently in practice.


๐Ÿ“Œ Table of Contents


๐Ÿง  Why Loss Functions Matter

Imagine you are predicting how many apples will be harvested. Your model makes guesses, but those guesses are not always correct.

The loss function acts like a scorekeeper. It tells you how far off your predictions are — and more importantly — how the model should improve.

๐Ÿ“– Deep Insight

A model doesn’t learn directly from data — it learns by minimizing loss. Different loss functions guide the model in different directions.


๐Ÿ“ L1 Loss — Measuring Absolute Error

L1 loss, also known as Mean Absolute Error, measures how far predictions are from actual values using simple distance.

It does not care whether the error is positive or negative — it only cares about how big the mistake is.

Let’s go back to the apple example.

Actual harvest is 100 apples. If your prediction is 90, the error is 10.

If you make multiple predictions, you simply average all these differences.

๐Ÿ“– Step-by-Step Example

Prediction errors:

90 → error 10 95 → error 5 105 → error 5

Average error = (10 + 5 + 5) / 3 = 6.67

What makes L1 interesting is its fairness — every error is treated equally.

A mistake of 50 is not dramatically worse than a mistake of 10. This makes L1 loss resistant to extreme values.


๐Ÿ“ L2 Loss — Squaring the Error

L2 loss, also known as Mean Squared Error, takes a different approach.

Instead of just measuring the difference, it squares the error.

This small change has a big impact.

Errors grow rapidly when squared:

10 becomes 100 5 becomes 25

๐Ÿ“– Step-by-Step Example

Prediction errors:

90 → error 10 → squared 100 95 → error 5 → squared 25 105 → error 5 → squared 25

Average = (100 + 25 + 25) / 3 = 50

This means L2 loss strongly punishes large mistakes.

Even one big error can dominate the total loss.


⚖️ The Real Difference (Intuition)

The difference between L1 and L2 is not just mathematical — it reflects a philosophy.

L1 says: “Every mistake matters equally.”

L2 says: “Big mistakes are much worse than small ones.”

So the choice depends on what kind of mistakes you care about.

๐Ÿ“– When to Use Each

Use L1 when your data contains outliers and you want stability. Use L2 when large errors are unacceptable and must be minimized aggressively.


๐Ÿงฎ The Math Behind L1 and L2 Loss (Made Simple)

Now that we understand the intuition, let’s briefly look at the mathematical form — without making it complicated.

Don’t worry — the goal here is not to memorize formulas, but to understand what they are doing.

๐Ÿ“ L1 Loss Formula

L1 Loss = (1/n) * ฮฃ |y_actual - y_predicted|

Here’s what this means in simple terms:

- Take the difference between actual and predicted values - Convert it into a positive number (absolute value) - Add all errors together - Divide by total number of data points

So L1 is basically calculating the average distance between prediction and reality.

๐Ÿ“– Intuition

Think of this like measuring how far you missed a target, without caring if you missed left or right. Only the distance matters.


๐Ÿ“ L2 Loss Formula

L2 Loss = (1/n) * ฮฃ (y_actual - y_predicted)²

This looks similar, but there’s one key difference — the error is squared.

So instead of just measuring distance, we:

- Calculate the difference - Square it (multiply it by itself) - Add all squared errors - Take the average

This squaring step is what changes everything.

๐Ÿ“– Intuition

Squaring makes big errors much bigger. For example:

Error of 2 → becomes 4 Error of 10 → becomes 100

So large mistakes dominate the loss.


⚖️ Why This Difference Matters Mathematically

Mathematically, L1 grows in a straight line, while L2 grows faster as errors increase.

This means:

- L1 treats all errors evenly - L2 increasingly punishes larger errors

That’s why L2 is sensitive to outliers, while L1 is more stable.

๐Ÿ“– One-Line Summary

L1 = linear penalty L2 = exponential-like penalty (due to squaring)


๐Ÿ’ป Code Example

import numpy as np

y_true = np.array([100, 100, 100])
y_pred = np.array([90, 95, 105])

# L1 Loss (MAE)
l1 = np.mean(np.abs(y_true - y_pred))

# L2 Loss (MSE)
l2 = np.mean((y_true - y_pred)**2)

print("L1 Loss:", l1)
print("L2 Loss:", l2)

This code shows how both loss functions compute error differently using the same predictions.


๐Ÿ–ฅ️ CLI Output Example

Calculating Loss...

L1 Loss: 6.67
L2 Loss: 50.0

Observation:
L2 is much larger because it penalizes bigger errors more heavily

๐Ÿ’ก Key Takeaways

L1 loss gives a balanced, stable view of error and is less affected by extreme values. L2 loss magnifies large errors, making it useful when big mistakes must be avoided.

Neither is universally better — the right choice depends on the nature of your data and the cost of errors in your problem.


๐Ÿ”— Related Articles


๐Ÿ“Œ Final Thought

A loss function is more than a formula — it defines what your model considers “important.” Choose it carefully, because it directly shapes how your model learns from mistakes.

Featured Post

How HMT Watches Lost the Time: A Deep Dive into Disruptive Innovation Blindness in Indian Manufacturing

The Rise and Fall of HMT Watches: A Story of Brand Dominance and Disruptive Innovation Blindness The Rise and Fal...

Popular Posts