Friday, December 27, 2024

Impact of Regularization on Decision Tree Regression

Decision Tree Regularization Explained – Underfitting vs Overfitting

🌳 Decision Tree Regularization – A Story of Simplicity vs Complexity

Imagine you're trying to draw a smooth curve through a messy set of points. Do you draw a simple line… or a super detailed zig-zag that passes through every point?

That exact dilemma is called: Underfitting vs Overfitting

This blog walks you through it using decision trees—step by step.

🎲 Step 1: Generating Data


X = np.sort(5 * rng.rand(80, 1), axis=0)
y = np.sin(X).ravel()
y[::5] += 3 * (0.5 - rng.rand(16))

This creates:

80 random points between 0 and 5
A sine curve as the base pattern
Noise added every 5th point

👉 Real-world data is never clean—noise simulates reality.

⚙️ Step 2: Two Competing Models

Model	Settings	Behavior
Model 1	max_depth=2	Simple (Underfits)
Model 2	max_depth=5, min_samples_leaf=10	Complex (Balanced)


regr_1 = DecisionTreeRegressor(max_depth=2)
regr_2 = DecisionTreeRegressor(max_depth=5, min_samples_leaf=10)

regr_1.fit(X, y)
regr_2.fit(X, y)

📐 The Math (Made Easy)

1. Model Error

\[ Error = Bias^2 + Variance + Noise \]

Simple Meaning:

Bias → Too simple (misses pattern)
Variance → Too complex (fits noise)
Noise → Randomness in data

👉 Good model = Balance between bias and variance

2. Tree Depth Effect

\[ Depth \uparrow \Rightarrow Variance \uparrow \]

\[ Depth \downarrow \Rightarrow Bias \uparrow \]

Meaning:

Deeper trees → more flexible → risk overfitting
Shallow trees → more rigid → risk underfitting

3. Leaf Constraint

\[ Leaf\ Size \uparrow \Rightarrow Smoother\ Model \]

This prevents tiny splits that memorize noise.

💻 Step 3: Predictions


X_test = np.arange(0.0, 5.0, 0.01)[:, np.newaxis]

y_1 = regr_1.predict(X_test)
y_2 = regr_2.predict(X_test)

🖥️ CLI Output (Conceptual)

Click to Expand

Model 1 (Depth=2):
- Smooth curve
- Misses fluctuations

Model 2 (Depth=5):

* Follows data closely
* Captures more detail

📊 Understanding the Plot

Orange dots → Actual noisy data
Blue line → Shallow tree
Green line → Deeper tree

🔵 Shallow Tree (max_depth=2)

Captures overall trend
Misses detail
Underfitting

🟢 Deeper Tree (max_depth=5)

Captures more patterns
More flexible
Risk of overfitting

👉 The goal is NOT perfect fit—it's generalization.

🛡️ What is Regularization?

Regularization controls how complex your model becomes.

Key Techniques:

max_depth → limits tree size
min_samples_leaf → prevents tiny splits

Think of it like:

“Don’t let the model memorize—force it to learn patterns.”

💡 Key Takeaways

Shallow trees = simple but may underfit
Deep trees = powerful but may overfit
Regularization balances both
Math helps explain model behavior clearly

🎯 Final Insight

A perfect model is not the one that fits the training data best…

It’s the one that performs best on unseen data.

Pages

Friday, December 27, 2024

Impact of Regularization on Decision Tree Regression

🌳 Decision Tree Regularization – A Story of Simplicity vs Complexity

📚 Table of Contents

🎲 Step 1: Generating Data

⚙️ Step 2: Two Competing Models

📐 The Math (Made Easy)

1. Model Error

Simple Meaning:

2. Tree Depth Effect

3. Leaf Constraint

💻 Step 3: Predictions

🖥️ CLI Output (Conceptual)

📊 Understanding the Plot

🔵 Shallow Tree (max_depth=2)

🟢 Deeper Tree (max_depth=5)

🛡️ What is Regularization?

Key Techniques:

💡 Key Takeaways

🎯 Final Insight

No comments:

Post a Comment

Featured Post

Popular Posts

🧠 AI Quiz

🎯 Guess Game

⚡ Speed Test

✊ Rock Paper Scissors

🔢 Quick Math

🧩 Memory Game

⌨️ Typing Speed

🟥 Color Click

🎲 Dice Game

Latest Posts

AI Category

🚀 Trending AI Projects

📊 Data Science Resources

📚 Latest Research Papers

🔥 New AI Tools

💬 Developer Discussions

Contact Form

Followers