Tuesday, September 24, 2024

Cost Complexity Pruning Explained (Decision Trees Made Simple)

Cost Complexity Pruning (CCP) Explained

๐ŸŒณ Cost Complexity Pruning (CCP)

Cost Complexity Pruning (CCP) is a pruning technique used in decision tree algorithms to reduce overfitting by balancing model accuracy with tree simplicity.

❓ What Is Cost Complexity Pruning?

CCP introduces a penalty for tree complexity using a parameter called alpha (ฮฑ). Larger trees are penalized more heavily, encouraging simpler models.

Cost Complexity = R(T) + ฮฑ × |T| Where: R(T) = training error of tree T |T| = number of leaf nodes ฮฑ = complexity penalty

⚙️ How CCP Works

The decision tree is grown to its maximum depth, often resulting in overfitting.

Each possible subtree is evaluated using the cost complexity function.

  • Low ฮฑ → larger tree, higher accuracy
  • High ฮฑ → smaller tree, stronger pruning

The subtree with the lowest cost complexity is selected as the final model.

๐ŸŽ CCP Example: Fruit Classification

Dataset: Apples, Bananas, Cherries
Features: Weight, Color

Empirical Risk R(T) = 10 Number of Leaves |T| = 5 Alpha = 1 Cost = 10 + (1 × 5) = 15
Subtree A: R=8, |T|=3 → Cost = 11 Subtree B: R=9, |T|=4 → Cost = 13 Subtree C: R=10, |T|=2 → Cost = 12

Subtree A has the lowest cost complexity (11) and is selected as the final model.

✅ Final Pruned Tree

The pruned tree is:

  • Simpler
  • Less prone to overfitting
  • More generalizable to unseen data
๐Ÿ’ก Key Takeaways
  • CCP balances accuracy and complexity
  • Alpha controls pruning strength
  • Lower cost ≠ lowest error alone
  • Simpler trees generalize better
  • Used in CART-based decision trees

No comments:

Post a Comment

Featured Post

How HMT Watches Lost the Time: A Deep Dive into Disruptive Innovation Blindness in Indian Manufacturing

The Rise and Fall of HMT Watches: A Story of Brand Dominance and Disruptive Innovation Blindness The Rise and Fal...

Popular Posts