Train vs Test Accuracy Explained (Simple + Practical Guide)
๐ Table of Contents
- Why Model Evaluation Matters
- What is Train Accuracy?
- What is Test Accuracy?
- Overfitting vs Underfitting
- Step-by-Step Example
- Code Example
- CLI Output
- How to Interpret Results
- Key Takeaways
- Related Articles
๐ Why Model Evaluation Matters
When building a machine learning model, the goal is NOT just to perform well on known data, but to perform well on new unseen data.
๐ What is Train Accuracy?
Train accuracy tells us how well the model learned from the training data.
If it is high → model has learned patterns well.
BUT…
๐งช What is Test Accuracy?
Test accuracy checks how well the model performs on new data it has never seen before.
⚠️ Overfitting vs Underfitting
Overfitting:
Model memorizes training data but fails on new data.
Underfitting:
Model is too simple and fails on both training and test data.
Overfitting = "memorizing answers"
Underfitting = "not studying enough"
๐ Step-by-Step Example
We will use a real dataset: Breast Cancer dataset.
- Features → tumor data
- Target → benign or malignant
๐ป Code Example
import numpy as np
from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import accuracy_score
# Load data
data = load_breast_cancer()
X = data.data
y = data.target
# Split data
X_train, X_test, y_train, y_test = train_test_split(
X, y, test_size=0.2, random_state=42
)
# Train model
model = RandomForestClassifier(random_state=42)
model.fit(X_train, y_train)
# Train accuracy
train_pred = model.predict(X_train)
train_acc = accuracy_score(y_train, train_pred)
# Test accuracy
test_pred = model.predict(X_test)
test_acc = accuracy_score(y_test, test_pred)
print("Train Accuracy:", train_acc)
print("Test Accuracy:", test_acc)
๐ฅ CLI Output
Train Accuracy: 1.00 Test Accuracy: 0.96
๐ How to Interpret Results
- Train >> Test → Overfitting
- Train ≈ Test (both high) → Good model
- Both low → Underfitting
๐ฏ Key Takeaways
✔ Train accuracy alone is misleading
✔ Overfitting is very common
✔ Goal = good performance on unseen data
๐ Related Articles
- Train vs Validation vs Test
- Parameter Tuning Guide
- Sentence Categorization
- Softmax vs Probability
- Beyond Accuracy
๐ Final Thought
A good model is not the one that performs best on training data, but the one that performs reliably on new data.
No comments:
Post a Comment