WCSS Made Simple (Cluster Quality Explained Clearly)
๐ Table of Contents
- Clustering Goal
- What is WCSS?
- Why WCSS Matters
- How WCSS Works
- Elbow Method
- Example
- Code
- CLI Output
- Common Mistakes
- Key Takeaways
๐ฏ What is the Goal of Clustering?
Clustering tries to group similar data points together.
๐ What is WCSS?
WCSS tells us how tight our clusters are.
Simple idea:
If points are close → WCSS is low (good) If points are far → WCSS is high (bad)
⭐ Why WCSS Matters
- Helps measure cluster quality
- Used to choose number of clusters (K)
- Makes clustering more reliable
๐งฎ How WCSS Works (Step-by-Step)
Think of this process:
- Find center (centroid) of cluster
- Measure distance of each point from center
- Square the distance
- Add all values
Formula:
WCSS = ฮฃ (distance between point and centroid)²
๐ Elbow Method (Very Important)
We don’t know the correct number of clusters (K).
So we try different values of K:
- K = 1 → high WCSS
- K = 2 → lower WCSS
- K = 3 → even lower
At some point, improvement slows down.
๐ Simple Example
Imagine grouping marbles:
- 1 group → very spread out → high WCSS
- 2 groups → better grouping
- 3 groups → even better
- After that → small improvement
๐ That stopping point = optimal clusters
๐ป Code Example
from sklearn.cluster import KMeans
import matplotlib.pyplot as plt
wcss = []
for i in range(1, 6):
kmeans = KMeans(n_clusters=i)
kmeans.fit([[1,2],[2,3],[3,4],[10,11],[11,12]])
wcss.append(kmeans.inertia_)
print(wcss)
๐ฅ CLI Output Example
[200.5, 95.3, 40.2, 35.1, 34.8]
Notice how the drop slows down → elbow point
⚠️ Common Mistakes
- Choosing too many clusters
- Ignoring business meaning
- Blindly trusting elbow method
๐ฏ Key Takeaways
๐ Final Thought
WCSS answers one simple question: “How close are my data points inside each cluster?”
No comments:
Post a Comment