Residuals in Decision Trees (Explained Simply)
๐ Table of Contents
- What is a Decision Tree?
- What Are Residuals?
- Why Residuals Matter
- How Boosting Uses Residuals
- Step-by-Step Process
- Simple Example
- Code Example
- CLI Output
- Common Mistakes
- Key Takeaways
๐ณ What is a Decision Tree?
A decision tree is a model that makes decisions step-by-step, just like a flowchart.
Example:
- Is house in city?
- Yes → Is size > 1500?
- No → Is location premium?
๐ What Are Residuals?
Residual = Actual Value - Predicted Value
Example:
- Actual price = 450,000
- Predicted = 400,000
- Residual = 50,000
๐ง Why Residuals Matter
A normal decision tree makes predictions once and stops.
But what if we could:
- Find mistakes
- Fix them
- Improve step-by-step
๐ How Boosting Uses Residuals
Boosting builds multiple trees, one after another.
Each new tree focuses only on mistakes of previous trees.
So instead of:
- One big tree
We get:
- Many small trees fixing errors step-by-step
๐ Step-by-Step Process
- Build first tree → get predictions
- Calculate residuals (errors)
- Build second tree on residuals
- Add predictions together
- Repeat
๐ Simple Example
Let’s say:
- Actual = 100
- Tree 1 predicts = 80
Residual = 20
Now:
- Tree 2 learns to predict = 20
Final prediction:
80 + 20 = 100 ✔
๐ป Code Example (Gradient Boosting)
from sklearn.ensemble import GradientBoostingRegressor import numpy as np X = np.array([[1],[2],[3],[4]]) y = np.array([10,20,30,40]) model = GradientBoostingRegressor() model.fit(X,y) print(model.predict([[5]]))
๐ฅ CLI Output Example
[49.8]
The model gradually learns correct values using residuals.
⚠️ Common Mistakes
- Thinking residuals = random error (they are useful signals)
- Using too many trees → overfitting
- Not understanding sequential learning
๐ฏ Key Takeaways
๐ Final Thought
Residuals turn a simple model into a powerful one by teaching it: "Learn from your mistakes."
No comments:
Post a Comment