๐ค K-Nearest Neighbors (KNN) – How to Choose the Right K
Choosing the right value of K in KNN can make or break your model. Too small, and your model overfits. Too large, and it becomes too simple.
๐ Table of Contents
- What is KNN?
- Math Behind KNN
- Role of K
- Factors Affecting K
- Finding Optimal K
- Code Example
- CLI Output
- Key Takeaways
- Related Articles
๐ What is KNN?
KNN is a simple algorithm that classifies a data point based on its nearest neighbors.
๐ Math Behind KNN (Simple)
1. Distance Calculation
\[ d = \sqrt{\sum_{i=1}^{n}(x_i - y_i)^2} \]
This is called Euclidean distance.
2. Prediction Rule
\[ y = \text{majority}(neighbors) \]
For regression:
\[ y = \frac{1}{K} \sum_{i=1}^{K} y_i \]
๐ฏ Role of K
| K Value | Effect |
|---|---|
| Small K | High variance (overfitting) |
| Large K | High bias (underfitting) |
๐ Factors to Consider
- Dataset size
- Data distribution
- Number of features
- Problem type
๐ Methods to Find Optimal K
1. Cross Validation
Test multiple K values and compare performance.
2. Elbow Method
\[ Error(K) \]
Plot error vs K and find the “elbow point”.
3. Grid Search
Test all values systematically.
๐ป Code Example
from sklearn.neighbors import KNeighborsClassifier
from sklearn.model_selection import cross_val_score
k_values = range(1, 20)
scores = []
for k in k_values:
model = KNeighborsClassifier(n_neighbors=k)
score = cross_val_score(model, X, y, cv=5).mean()
scores.append(score)
print(scores)
๐ฅ️ CLI Output
Click to Expand
K=1 → Accuracy: 0.91 K=5 → Accuracy: 0.95 K=10 → Accuracy: 0.94 Best K = 5
๐ก Key Takeaways
- K controls model complexity
- Small K → overfitting
- Large K → underfitting
- Use validation to find best K
๐ฏ Final Thought
Choosing K is not guesswork—it’s experimentation backed by math.
Once you understand the balance between bias and variance, KNN becomes a powerful and intuitive tool.
No comments:
Post a Comment