Tuesday, September 24, 2024

Decision Tree Classifier Performance on Iris Dataset: Pruning vs. No Pruning

Decision Tree Pruning on Iris Dataset – Complete Educational Guide

🌸 Decision Tree Classifier on Iris Dataset (With & Without Pruning)

This guide walks you through one of the most fundamental machine learning problems using the Iris dataset. The goal is not just to build a model—but to deeply understand how pruning affects decision trees.

🚀 Introduction

The Iris dataset is one of the most famous datasets in machine learning. It contains 150 samples of flowers belonging to three species. Each sample has four features:

Sepal Length
Sepal Width
Petal Length
Petal Width

The task is to classify the species using a Decision Tree Classifier.

📊 Understanding the Dataset

The dataset is perfectly balanced:

50 samples per species
3 classes total

This balance is one reason why decision trees perform exceptionally well here.

🌳 Decision Tree Basics

A Decision Tree works by splitting data based on feature thresholds. Each split is chosen to maximize class separation.

Example rule:

if petal_length < 2.5 → Setosa

🧠 Mathematics Behind Decision Trees

1. Gini Impurity

\[ Gini = 1 - \sum_{i=1}^{n} p_i^2 \]

Where \( p_i \) is the probability of class \( i \).

Explanation:

Gini measures how mixed the classes are. Lower is better.

2. Entropy

\[ Entropy = -\sum_{i=1}^{n} p_i \log_2(p_i) \]

This measures randomness in the dataset.

3. Information Gain

\[ IG = Entropy(parent) - \sum (weight \times Entropy(child)) \]

This tells us how much a split improves classification.

📌 Baseline Model (No Pruning)

Code Example


from sklearn.tree import DecisionTreeClassifier
model = DecisionTreeClassifier()
model.fit(X_train, y_train)

CLI Output

View Output

Train Accuracy: 100%
Test Accuracy: 100%
Tree Depth: 6
Leaves: 10

This indicates perfect classification.

✂️ Pruning Explained

Pruning removes unnecessary branches to reduce overfitting.

Cost Complexity Formula

\[ R_\alpha(T) = R(T) + \alpha |T| \]

\(R(T)\): Error of tree
\(|T|\): Number of leaves
\(\alpha\): Regularization parameter

📉 Pruning Results

ccp_alpha = 0.00924

Train Accuracy: 99%
Test Accuracy: 100%
Depth: 6
Leaves: 8

ccp_alpha = 0.01270

Train Accuracy: 97.14%
Test Accuracy: 100%
Depth: 4
Leaves: 6

ccp_alpha = 0.01847

Train Accuracy: 94.29%
Test Accuracy: 100%
Depth: 3
Leaves: 4

ccp_alpha = 0.02706

Train Accuracy: 94.29%
Test Accuracy: 97.78%
Depth: 2
Leaves: 3

ccp_alpha = 0.25029

Train Accuracy: 64.76%
Test Accuracy: 71.11%
Depth: 1
Leaves: 2

ccp_alpha = 0.31211

Train Accuracy: 35.24%
Test Accuracy: 28.89%
Depth: 0
Leaves: 1

⚖️ Model Comparison

ccp_alpha	Depth	Leaves	Train Accuracy	Test Accuracy
0.00000	6	10	100%	100%
0.01270	4	6	97%	100%
0.02706	2	3	94%	97%
0.31211	0	1	35%	28%

🧩 Interactive Learning

Try adjusting ccp_alpha and observe how:

Tree depth changes
Accuracy changes
Overfitting reduces

💡 Key Takeaways

Decision Trees can overfit easily
Pruning reduces complexity
Too much pruning harms performance
For Iris dataset, pruning was unnecessary

🎯 Final Conclusion

In this case, the simplest answer was the best: no pruning. The dataset is clean and well-separated, so a full tree performs perfectly.

However, in real-world datasets, pruning is often essential.

Pages

Tuesday, September 24, 2024

🌸 Decision Tree Classifier on Iris Dataset (With & Without Pruning)

📚 Table of Contents

🚀 Introduction

📊 Understanding the Dataset

🌳 Decision Tree Basics

🧠 Mathematics Behind Decision Trees

1. Gini Impurity

Explanation:

2. Entropy

3. Information Gain

📌 Baseline Model (No Pruning)

Code Example

CLI Output

✂️ Pruning Explained

Cost Complexity Formula

📉 Pruning Results

⚖️ Model Comparison

🧩 Interactive Learning

💡 Key Takeaways

🎯 Final Conclusion

Featured Post

Popular Posts

🧠 AI Quiz

🎯 Guess Game

⚡ Speed Test

✊ Rock Paper Scissors

🔢 Quick Math

🧩 Memory Game

⌨️ Typing Speed

🟥 Color Click

🎲 Dice Game

Latest Posts

AI Category

🚀 Trending AI Projects

📊 Data Science Resources

📚 Latest Research Papers

🔥 New AI Tools

💬 Developer Discussions

Contact Form

Followers