Showing posts with label leaves. Show all posts
Showing posts with label leaves. Show all posts

Tuesday, September 24, 2024

Decision Tree Classifier Performance on Iris Dataset: Pruning vs. No Pruning

Decision Tree Pruning on Iris Dataset – Complete Educational Guide

๐ŸŒธ Decision Tree Classifier on Iris Dataset (With & Without Pruning)

This guide walks you through one of the most fundamental machine learning problems using the Iris dataset. The goal is not just to build a model—but to deeply understand how pruning affects decision trees.


๐Ÿ“š Table of Contents


๐Ÿš€ Introduction

The Iris dataset is one of the most famous datasets in machine learning. It contains 150 samples of flowers belonging to three species. Each sample has four features:

  • Sepal Length
  • Sepal Width
  • Petal Length
  • Petal Width

The task is to classify the species using a Decision Tree Classifier.


๐Ÿ“Š Understanding the Dataset

The dataset is perfectly balanced:

  • 50 samples per species
  • 3 classes total

This balance is one reason why decision trees perform exceptionally well here.


๐ŸŒณ Decision Tree Basics

A Decision Tree works by splitting data based on feature thresholds. Each split is chosen to maximize class separation.

Example rule:

if petal_length < 2.5 → Setosa

๐Ÿง  Mathematics Behind Decision Trees

1. Gini Impurity

\[ Gini = 1 - \sum_{i=1}^{n} p_i^2 \]

Where \( p_i \) is the probability of class \( i \).

Explanation:

Gini measures how mixed the classes are. Lower is better.

2. Entropy

\[ Entropy = -\sum_{i=1}^{n} p_i \log_2(p_i) \]

This measures randomness in the dataset.

3. Information Gain

\[ IG = Entropy(parent) - \sum (weight \times Entropy(child)) \]

This tells us how much a split improves classification.


๐Ÿ“Œ Baseline Model (No Pruning)

Code Example

from sklearn.tree import DecisionTreeClassifier model = DecisionTreeClassifier() model.fit(X_train, y_train)

CLI Output

View Output
Train Accuracy: 100%
Test Accuracy: 100%
Tree Depth: 6
Leaves: 10

This indicates perfect classification.


✂️ Pruning Explained

Pruning removes unnecessary branches to reduce overfitting.

Cost Complexity Formula

\[ R_\alpha(T) = R(T) + \alpha |T| \]

  • \(R(T)\): Error of tree
  • \(|T|\): Number of leaves
  • \(\alpha\): Regularization parameter

๐Ÿ“‰ Pruning Results

ccp_alpha = 0.00924
Train Accuracy: 99%
Test Accuracy: 100%
Depth: 6
Leaves: 8
ccp_alpha = 0.01270
Train Accuracy: 97.14%
Test Accuracy: 100%
Depth: 4
Leaves: 6
ccp_alpha = 0.01847
Train Accuracy: 94.29%
Test Accuracy: 100%
Depth: 3
Leaves: 4
ccp_alpha = 0.02706
Train Accuracy: 94.29%
Test Accuracy: 97.78%
Depth: 2
Leaves: 3
ccp_alpha = 0.25029
Train Accuracy: 64.76%
Test Accuracy: 71.11%
Depth: 1
Leaves: 2
ccp_alpha = 0.31211
Train Accuracy: 35.24%
Test Accuracy: 28.89%
Depth: 0
Leaves: 1

⚖️ Model Comparison

ccp_alpha Depth Leaves Train Accuracy Test Accuracy
0.00000610100%100%
0.012704697%100%
0.027062394%97%
0.312110135%28%

๐Ÿงฉ Interactive Learning

Try adjusting ccp_alpha and observe how:

  • Tree depth changes
  • Accuracy changes
  • Overfitting reduces

๐Ÿ’ก Key Takeaways

  • Decision Trees can overfit easily
  • Pruning reduces complexity
  • Too much pruning harms performance
  • For Iris dataset, pruning was unnecessary

๐ŸŽฏ Final Conclusion

In this case, the simplest answer was the best: no pruning. The dataset is clean and well-separated, so a full tree performs perfectly.

However, in real-world datasets, pruning is often essential.

Featured Post

How HMT Watches Lost the Time: A Deep Dive into Disruptive Innovation Blindness in Indian Manufacturing

The Rise and Fall of HMT Watches: A Story of Brand Dominance and Disruptive Innovation Blindness The Rise and Fal...

Popular Posts