Thursday, September 26, 2024

Comparing Train and Test Accuracy in Machine Learning: A Practical Guide

Train vs Test Accuracy Explained (Simple Guide with Example)

Train vs Test Accuracy Explained (Simple + Practical Guide)

๐Ÿ“š Table of Contents


๐Ÿ“– Why Model Evaluation Matters

When building a machine learning model, the goal is NOT just to perform well on known data, but to perform well on new unseen data.

๐Ÿ’ก Real goal: Build a model that works in real-world situations, not just on training data.

๐Ÿ“Š What is Train Accuracy?

Train accuracy tells us how well the model learned from the training data.

If it is high → model has learned patterns well.

BUT…

๐Ÿ’ก High train accuracy alone does NOT mean the model is good.

๐Ÿงช What is Test Accuracy?

Test accuracy checks how well the model performs on new data it has never seen before.

๐Ÿ’ก This is the most important metric for real-world performance.

⚠️ Overfitting vs Underfitting

Overfitting:

Model memorizes training data but fails on new data.

Underfitting:

Model is too simple and fails on both training and test data.

๐Ÿ’ก Think like this:
Overfitting = "memorizing answers"
Underfitting = "not studying enough"

๐Ÿ“Š Step-by-Step Example

We will use a real dataset: Breast Cancer dataset.

  • Features → tumor data
  • Target → benign or malignant

๐Ÿ’ป Code Example

import numpy as np
from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import accuracy_score

# Load data
data = load_breast_cancer()
X = data.data
y = data.target

# Split data
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, random_state=42
)

# Train model
model = RandomForestClassifier(random_state=42)
model.fit(X_train, y_train)

# Train accuracy
train_pred = model.predict(X_train)
train_acc = accuracy_score(y_train, train_pred)

# Test accuracy
test_pred = model.predict(X_test)
test_acc = accuracy_score(y_test, test_pred)

print("Train Accuracy:", train_acc)
print("Test Accuracy:", test_acc)

๐Ÿ–ฅ CLI Output

Train Accuracy: 1.00
Test Accuracy: 0.96

๐Ÿ“Š How to Interpret Results

  • Train >> Test → Overfitting
  • Train ≈ Test (both high) → Good model
  • Both low → Underfitting
๐Ÿ’ก Ideal case: Train and Test accuracy should be close.

๐ŸŽฏ Key Takeaways

✔ Always check test accuracy
✔ Train accuracy alone is misleading
✔ Overfitting is very common
✔ Goal = good performance on unseen data


๐Ÿš€ Final Thought

A good model is not the one that performs best on training data, but the one that performs reliably on new data.

No comments:

Post a Comment

Featured Post

How HMT Watches Lost the Time: A Deep Dive into Disruptive Innovation Blindness in Indian Manufacturing

The Rise and Fall of HMT Watches: A Story of Brand Dominance and Disruptive Innovation Blindness The Rise and Fal...

Popular Posts