Huber Loss Explained: A Complete Guide for Machine Learning
๐ Table of Contents
- Introduction
- What is Huber Loss?
- Mathematical Formula
- MSE vs MAE vs Huber
- Worked Example
- Code Implementation
- When to Use
- Key Takeaways
- Related Articles
Introduction
In machine learning, choosing the right loss function can significantly impact model performance. Huber Loss is a hybrid loss function designed to balance sensitivity and robustness.
What is Huber Loss?
Huber Loss is used in regression tasks to measure prediction error. It behaves differently depending on how large the error is.
Small errors → treated like MSE (quadratic)
Large errors → treated like MAE (linear)
๐ Mathematical Formula
Huber Loss is defined as:
$$ L_{\delta}(a) = \begin{cases} \frac{1}{2}a^2 & \text{if } |a| \leq \delta \\ \delta (|a| - \frac{1}{2}\delta) & \text{otherwise} \end{cases} $$
Where:
- \( a = y - \hat{y} \) (error)
- \( \delta \) = threshold
MSE vs MAE vs Huber
| Loss Function | Behavior | Outlier Sensitivity |
|---|---|---|
| MSE | Quadratic | High |
| MAE | Linear | Low |
| Huber | Hybrid | Moderate |
Worked Example
Given actual vs predicted values:
- Actual: 200k, 250k, 300k, 3M
- Predicted: 210k, 240k, 290k, 2.8M
Errors:
$$ -10,000,\; 10,000,\; 10,000,\; 200,000 $$
For small errors:
$$ \frac{1}{2}(10,000)^2 = 50,000,000 $$
Total small error loss:
$$ 150,000,000 $$
For large error:
$$ 50,000 \times (200,000 - 25,000) = 7,500,000,000 $$
๐ป Code Implementation
Python Example
import numpy as np
def huber_loss(y_true, y_pred, delta=1.0):
error = y_true - y_pred
if abs(error) <= delta:
return 0.5 * error**2
else:
return delta * (abs(error) - 0.5 * delta)
CLI-style Output
Input: y_true=10, y_pred=8
Output: 2.0
Input: y_true=100, y_pred=50
Output: Linear Loss Applied
When to Use Huber Loss
- Datasets with outliers
- Regression problems
- Financial predictions
- Real estate modeling
- Clean datasets without noise
- When simplicity is required
๐ฏ Key Takeaways
- Huber Loss balances MSE and MAE
- Handles outliers effectively
- Controlled by threshold ฮด
- Widely used in robust regression
Conclusion
Huber Loss is a powerful and practical loss function that provides the best of both worlds: precision for small errors and robustness for large ones.
If your dataset includes outliers or unpredictable spikes, Huber Loss is often the safest and most balanced choice.
No comments:
Post a Comment