Friday, December 27, 2024

Impact of Regularization on Decision Tree Regression

Decision Tree Regularization Explained – Underfitting vs Overfitting

🌳 Decision Tree Regularization – A Story of Simplicity vs Complexity

Imagine you're trying to draw a smooth curve through a messy set of points. Do you draw a simple line… or a super detailed zig-zag that passes through every point?

That exact dilemma is called: Underfitting vs Overfitting

This blog walks you through it using decision trees—step by step.

🎲 Step 1: Generating Data


X = np.sort(5 * rng.rand(80, 1), axis=0)
y = np.sin(X).ravel()
y[::5] += 3 * (0.5 - rng.rand(16))

This creates:

80 random points between 0 and 5
A sine curve as the base pattern
Noise added every 5th point

👉 Real-world data is never clean—noise simulates reality.

⚙️ Step 2: Two Competing Models

Model	Settings	Behavior
Model 1	max_depth=2	Simple (Underfits)
Model 2	max_depth=5, min_samples_leaf=10	Complex (Balanced)


regr_1 = DecisionTreeRegressor(max_depth=2)
regr_2 = DecisionTreeRegressor(max_depth=5, min_samples_leaf=10)

regr_1.fit(X, y)
regr_2.fit(X, y)

📐 The Math (Made Easy)

1. Model Error

\[ Error = Bias^2 + Variance + Noise \]

Simple Meaning:

Bias → Too simple (misses pattern)
Variance → Too complex (fits noise)
Noise → Randomness in data

👉 Good model = Balance between bias and variance

2. Tree Depth Effect

\[ Depth \uparrow \Rightarrow Variance \uparrow \]

\[ Depth \downarrow \Rightarrow Bias \uparrow \]

Meaning:

Deeper trees → more flexible → risk overfitting
Shallow trees → more rigid → risk underfitting

3. Leaf Constraint

\[ Leaf\ Size \uparrow \Rightarrow Smoother\ Model \]

This prevents tiny splits that memorize noise.

💻 Step 3: Predictions


X_test = np.arange(0.0, 5.0, 0.01)[:, np.newaxis]

y_1 = regr_1.predict(X_test)
y_2 = regr_2.predict(X_test)

🖥️ CLI Output (Conceptual)

Click to Expand

Model 1 (Depth=2):
- Smooth curve
- Misses fluctuations

Model 2 (Depth=5):

* Follows data closely
* Captures more detail

📊 Understanding the Plot

Orange dots → Actual noisy data
Blue line → Shallow tree
Green line → Deeper tree

🔵 Shallow Tree (max_depth=2)

Captures overall trend
Misses detail
Underfitting

🟢 Deeper Tree (max_depth=5)

Captures more patterns
More flexible
Risk of overfitting

👉 The goal is NOT perfect fit—it's generalization.

🛡️ What is Regularization?

Regularization controls how complex your model becomes.

Key Techniques:

max_depth → limits tree size
min_samples_leaf → prevents tiny splits

Think of it like:

“Don’t let the model memorize—force it to learn patterns.”

💡 Key Takeaways

Shallow trees = simple but may underfit
Deep trees = powerful but may overfit
Regularization balances both
Math helps explain model behavior clearly

🎯 Final Insight

A perfect model is not the one that fits the training data best…

It’s the one that performs best on unseen data.

Pages

Friday, December 27, 2024

🌳 Decision Tree Regularization – A Story of Simplicity vs Complexity

📚 Table of Contents

🎲 Step 1: Generating Data

⚙️ Step 2: Two Competing Models

📐 The Math (Made Easy)

1. Model Error

Simple Meaning:

2. Tree Depth Effect

3. Leaf Constraint

💻 Step 3: Predictions

🖥️ CLI Output (Conceptual)

📊 Understanding the Plot

🔵 Shallow Tree (max_depth=2)

🟢 Deeper Tree (max_depth=5)

🛡️ What is Regularization?

Key Techniques:

💡 Key Takeaways

🎯 Final Insight

Featured Post

Popular Posts

🧠 AI Quiz

🎯 Guess Game

⚡ Speed Test

✊ Rock Paper Scissors

🔢 Quick Math

🧩 Memory Game

⌨️ Typing Speed

🟥 Color Click

🎲 Dice Game

Latest Posts

AI Category

🚀 Trending AI Projects

📊 Data Science Resources

📚 Latest Research Papers

🔥 New AI Tools

💬 Developer Discussions

Contact Form

Followers