Sunday, September 29, 2024

WCSS in K-Means Clustering Explained for Beginners

WCSS Explained Simply: Understand Cluster Quality & Elbow Method

WCSS Made Simple (Cluster Quality Explained Clearly)

๐Ÿ“š Table of Contents


๐ŸŽฏ What is the Goal of Clustering?

Clustering tries to group similar data points together.

๐Ÿ’ก Good clustering = points inside a group are very similar ๐Ÿ’ก Bad clustering = points are far from each other

๐Ÿ“– What is WCSS?

WCSS tells us how tight our clusters are.

Simple idea:

๐Ÿ’ก “Are the points in a cluster close to each other or spread out?”

If points are close → WCSS is low (good) If points are far → WCSS is high (bad)


⭐ Why WCSS Matters

  • Helps measure cluster quality
  • Used to choose number of clusters (K)
  • Makes clustering more reliable

๐Ÿงฎ How WCSS Works (Step-by-Step)

Think of this process:

  1. Find center (centroid) of cluster
  2. Measure distance of each point from center
  3. Square the distance
  4. Add all values

Formula:

WCSS = ฮฃ (distance between point and centroid)²
๐Ÿ’ก Squaring makes far points matter more

๐Ÿ“ˆ Elbow Method (Very Important)

We don’t know the correct number of clusters (K).

So we try different values of K:

  • K = 1 → high WCSS
  • K = 2 → lower WCSS
  • K = 3 → even lower

At some point, improvement slows down.

๐Ÿ’ก That “bend” in the graph = best K (elbow point)

๐Ÿ“Š Simple Example

Imagine grouping marbles:

  • 1 group → very spread out → high WCSS
  • 2 groups → better grouping
  • 3 groups → even better
  • After that → small improvement

๐Ÿ‘‰ That stopping point = optimal clusters


๐Ÿ’ป Code Example

from sklearn.cluster import KMeans
import matplotlib.pyplot as plt

wcss = []

for i in range(1, 6):
    kmeans = KMeans(n_clusters=i)
    kmeans.fit([[1,2],[2,3],[3,4],[10,11],[11,12]])
    wcss.append(kmeans.inertia_)

print(wcss)

๐Ÿ–ฅ CLI Output Example

[200.5, 95.3, 40.2, 35.1, 34.8]

Notice how the drop slows down → elbow point


⚠️ Common Mistakes

  • Choosing too many clusters
  • Ignoring business meaning
  • Blindly trusting elbow method

๐ŸŽฏ Key Takeaways

✔ WCSS measures cluster tightness ✔ Lower WCSS = better grouping ✔ Used in elbow method ✔ Helps choose correct K ✔ Don’t rely only on WCSS — use logic too

๐Ÿš€ Final Thought

WCSS answers one simple question: “How close are my data points inside each cluster?”


๐Ÿ“š Related Articles

No comments:

Post a Comment

Featured Post

How HMT Watches Lost the Time: A Deep Dive into Disruptive Innovation Blindness in Indian Manufacturing

The Rise and Fall of HMT Watches: A Story of Brand Dominance and Disruptive Innovation Blindness The Rise and Fal...

Popular Posts