Showing posts with label Huber Loss. Show all posts
Showing posts with label Huber Loss. Show all posts

Tuesday, October 8, 2024

Why Huber Loss Is Useful for Handling Outliers in Regression


Huber Loss Explained | Robust Regression Loss Function Guide

Huber Loss Explained: A Complete Guide for Machine Learning

๐Ÿ“Œ Table of Contents


Introduction

In machine learning, choosing the right loss function can significantly impact model performance. Huber Loss is a hybrid loss function designed to balance sensitivity and robustness.

๐Ÿ’ก It combines the strengths of Mean Squared Error (MSE) and Mean Absolute Error (MAE).

What is Huber Loss?

Huber Loss is used in regression tasks to measure prediction error. It behaves differently depending on how large the error is.

Small errors → treated like MSE (quadratic)
Large errors → treated like MAE (linear)


๐Ÿ“Š Mathematical Formula

Huber Loss is defined as:

$$ L_{\delta}(a) = \begin{cases} \frac{1}{2}a^2 & \text{if } |a| \leq \delta \\ \delta (|a| - \frac{1}{2}\delta) & \text{otherwise} \end{cases} $$

Where:

  • \( a = y - \hat{y} \) (error)
  • \( \delta \) = threshold
๐Ÿ’ก The parameter \( \delta \) controls sensitivity to outliers.

MSE vs MAE vs Huber

Loss Function Behavior Outlier Sensitivity
MSE Quadratic High
MAE Linear Low
Huber Hybrid Moderate

Worked Example

Given actual vs predicted values:

  • Actual: 200k, 250k, 300k, 3M
  • Predicted: 210k, 240k, 290k, 2.8M

Errors:

$$ -10,000,\; 10,000,\; 10,000,\; 200,000 $$

For small errors:

$$ \frac{1}{2}(10,000)^2 = 50,000,000 $$

Total small error loss:

$$ 150,000,000 $$

For large error:

$$ 50,000 \times (200,000 - 25,000) = 7,500,000,000 $$


๐Ÿ’ป Code Implementation

Python Example

import numpy as np def huber_loss(y_true, y_pred, delta=1.0): error = y_true - y_pred if abs(error) <= delta: return 0.5 * error**2 else: return delta * (abs(error) - 0.5 * delta)

CLI-style Output

Input: y_true=10, y_pred=8 Output: 2.0 Input: y_true=100, y_pred=50 Output: Linear Loss Applied

When to Use Huber Loss

  • Datasets with outliers
  • Regression problems
  • Financial predictions
  • Real estate modeling
  • Clean datasets without noise
  • When simplicity is required

๐ŸŽฏ Key Takeaways

  • Huber Loss balances MSE and MAE
  • Handles outliers effectively
  • Controlled by threshold ฮด
  • Widely used in robust regression

Conclusion

Huber Loss is a powerful and practical loss function that provides the best of both worlds: precision for small errors and robustness for large ones.

If your dataset includes outliers or unpredictable spikes, Huber Loss is often the safest and most balanced choice.

Featured Post

How HMT Watches Lost the Time: A Deep Dive into Disruptive Innovation Blindness in Indian Manufacturing

The Rise and Fall of HMT Watches: A Story of Brand Dominance and Disruptive Innovation Blindness The Rise and Fal...

Popular Posts