Monday, September 16, 2024

Pre-Pruning vs Post-Pruning in Decision Trees

Pruning Decision Trees: Complete Guide with Intuition, Math & Code

🌳 Pruning Decision Trees: Complete Guide (Theory + Math + Code)

📚 Table of Contents

Decision Tree Intuition
How Trees Make Decisions (Math)
Why Overfitting Happens
What is Pruning?
Types of Pruning
Math Behind Pruning
Code Example
CLI Output
Key Takeaways
Related Articles

📌 Decision Tree Intuition (Deep Understanding)

A decision tree is not just a flowchart — it is a recursive partitioning algorithm. It keeps splitting data into smaller groups until each group becomes "pure".

🔍 Example (Intuition)

Suppose we classify emails:
Step 1 → Contains "offer"?
Step 2 → Contains "urgent"?
Step 3 → Sender known?

Each split reduces uncertainty.

💡 Goal: Reduce uncertainty at every split.

📊 How Trees Make Decisions (Math Made Easy)

Decision trees use impurity measures.

1. Gini Impurity

Gini = 1 - (p1² + p2² + ... + pn²)

👉 Simple meaning: - If all samples belong to one class → Gini = 0 (perfect) - If mixed → higher value

📌 Example

Class A = 50%, Class B = 50% Gini = 1 - (0.5² + 0.5²) = 0.5

---

2. Entropy (Information Gain)

Entropy = - Σ p log2(p)

👉 Measures randomness 👉 Higher entropy = more disorder

📌 Example

If both classes equal → entropy is high If one class dominates → entropy is low

💡 Trees choose splits that REDUCE impurity the most.

⚠️ Why Overfitting Happens (Core Concept)

A decision tree keeps splitting until:

Each leaf is pure
Or no further gain exists

Problem → It starts learning noise.

📉 Real Insight

If one rare data point exists, the tree may create a full branch just for it.

💡 High variance model = fits training data too perfectly but fails in real-world.

✂️ What is Pruning?

Pruning removes weak splits that do not generalize well.

Instead of asking: "Does this split improve training accuracy?" We ask: "Does this split help future data?"

🌿 Types of Pruning

1. Pre-Pruning

Expand

max_depth
min_samples_split
min_samples_leaf

2. Post-Pruning

Expand

Grow full tree → Remove unnecessary nodes later.

🧮 Math Behind Pruning (Very Important)

Cost Complexity Pruning Formula:

Rα(T) = R(T) + α|T|

Where:

R(T) → Error of tree
|T| → Number of leaf nodes
α → Complexity penalty

📌 Easy Explanation

Think of α as a "penalty for complexity".

If tree is too big → penalty increases If tree is small → penalty decreases

💡 Model tries to balance:
Accuracy vs Simplicity

📉 Practical Meaning

If adding a branch improves accuracy slightly but increases complexity a lot → it gets removed.

💻 Code Example

from sklearn.tree import DecisionTreeClassifier
from sklearn.datasets import load_iris

data = load_iris()
X, y = data.data, data.target

model = DecisionTreeClassifier(
    max_depth=3,
    min_samples_split=4
)

model.fit(X, y)

print("Training complete")

🖥️ CLI Output

$ python train.py

Loading dataset...
Splitting data...
Training model...

Applying pruning:
- max_depth = 3
- min_samples_split = 4

Training complete
Accuracy: 95%

🎯 Key Takeaways

Decision trees reduce uncertainty step by step
Overfitting occurs due to excessive branching
Pruning removes unnecessary complexity
Cost complexity pruning balances accuracy and simplicity
Simpler models generalize better

📌 Designed for deep learning + practical understanding.

Pages

Monday, September 16, 2024

Pre-Pruning vs Post-Pruning in Decision Trees

🌳 Pruning Decision Trees: Complete Guide (Theory + Math + Code)

📚 Table of Contents

📌 Decision Tree Intuition (Deep Understanding)

📊 How Trees Make Decisions (Math Made Easy)

1. Gini Impurity

2. Entropy (Information Gain)

⚠️ Why Overfitting Happens (Core Concept)

✂️ What is Pruning?

🌿 Types of Pruning

1. Pre-Pruning

2. Post-Pruning

🧮 Math Behind Pruning (Very Important)

💻 Code Example

🖥️ CLI Output

🎯 Key Takeaways

🔗 Related Articles

No comments:

Post a Comment

Featured Post

Popular Posts

🧠 AI Quiz

🎯 Guess Game

⚡ Speed Test

✊ Rock Paper Scissors

🔢 Quick Math

🧩 Memory Game

⌨️ Typing Speed

🟥 Color Click

🎲 Dice Game

Latest Posts

AI Category

🚀 Trending AI Projects

📊 Data Science Resources

📚 Latest Research Papers

🔥 New AI Tools

💬 Developer Discussions

Contact Form

Followers