๐ธ Decision Tree Classifier on Iris Dataset (With & Without Pruning)
This guide walks you through one of the most fundamental machine learning problems using the Iris dataset. The goal is not just to build a model—but to deeply understand how pruning affects decision trees.
๐ Table of Contents
- Introduction
- Understanding the Dataset
- Decision Tree Basics
- Mathematics Behind Decision Trees
- Baseline Model
- Pruning Explained
- Model Comparison
- Interactive Sections
- Key Takeaways
- Related Articles
๐ Introduction
The Iris dataset is one of the most famous datasets in machine learning. It contains 150 samples of flowers belonging to three species. Each sample has four features:
- Sepal Length
- Sepal Width
- Petal Length
- Petal Width
The task is to classify the species using a Decision Tree Classifier.
๐ Understanding the Dataset
The dataset is perfectly balanced:
- 50 samples per species
- 3 classes total
This balance is one reason why decision trees perform exceptionally well here.
๐ณ Decision Tree Basics
A Decision Tree works by splitting data based on feature thresholds. Each split is chosen to maximize class separation.
Example rule:
if petal_length < 2.5 → Setosa
๐ง Mathematics Behind Decision Trees
1. Gini Impurity
\[ Gini = 1 - \sum_{i=1}^{n} p_i^2 \]
Where \( p_i \) is the probability of class \( i \).
Explanation:
Gini measures how mixed the classes are. Lower is better.
2. Entropy
\[ Entropy = -\sum_{i=1}^{n} p_i \log_2(p_i) \]
This measures randomness in the dataset.
3. Information Gain
\[ IG = Entropy(parent) - \sum (weight \times Entropy(child)) \]
This tells us how much a split improves classification.
๐ Baseline Model (No Pruning)
Code Example
from sklearn.tree import DecisionTreeClassifier
model = DecisionTreeClassifier()
model.fit(X_train, y_train)
CLI Output
View Output
Train Accuracy: 100% Test Accuracy: 100% Tree Depth: 6 Leaves: 10
This indicates perfect classification.
✂️ Pruning Explained
Pruning removes unnecessary branches to reduce overfitting.
Cost Complexity Formula
\[ R_\alpha(T) = R(T) + \alpha |T| \]
- \(R(T)\): Error of tree
- \(|T|\): Number of leaves
- \(\alpha\): Regularization parameter
๐ Pruning Results
ccp_alpha = 0.00924
Train Accuracy: 99% Test Accuracy: 100% Depth: 6 Leaves: 8
ccp_alpha = 0.01270
Train Accuracy: 97.14% Test Accuracy: 100% Depth: 4 Leaves: 6
ccp_alpha = 0.01847
Train Accuracy: 94.29% Test Accuracy: 100% Depth: 3 Leaves: 4
ccp_alpha = 0.02706
Train Accuracy: 94.29% Test Accuracy: 97.78% Depth: 2 Leaves: 3
ccp_alpha = 0.25029
Train Accuracy: 64.76% Test Accuracy: 71.11% Depth: 1 Leaves: 2
ccp_alpha = 0.31211
Train Accuracy: 35.24% Test Accuracy: 28.89% Depth: 0 Leaves: 1
⚖️ Model Comparison
| ccp_alpha | Depth | Leaves | Train Accuracy | Test Accuracy |
|---|---|---|---|---|
| 0.00000 | 6 | 10 | 100% | 100% |
| 0.01270 | 4 | 6 | 97% | 100% |
| 0.02706 | 2 | 3 | 94% | 97% |
| 0.31211 | 0 | 1 | 35% | 28% |
๐งฉ Interactive Learning
Try adjusting ccp_alpha and observe how:
- Tree depth changes
- Accuracy changes
- Overfitting reduces
๐ก Key Takeaways
- Decision Trees can overfit easily
- Pruning reduces complexity
- Too much pruning harms performance
- For Iris dataset, pruning was unnecessary
๐ฏ Final Conclusion
In this case, the simplest answer was the best: no pruning. The dataset is clean and well-separated, so a full tree performs perfectly.
However, in real-world datasets, pruning is often essential.
No comments:
Post a Comment