Thursday, December 19, 2024

K-Means Clustering Analysis on Breast Cancer Dataset Using Object-Oriented Design

K-Means Clustering on Breast Cancer Dataset

K-Means Clustering on the Breast Cancer Dataset

Clustering analysis with object-oriented design principles

This example demonstrates how the Breast Cancer dataset can be analyzed using K-means clustering to uncover patterns in tumor data. The solution also applies object-oriented programming concepts such as inheritance and composition to organize the code cleanly.

Problem Overview

The goal is to group breast cancer cell samples into two clusters (benign and malignant) using unsupervised learning.

Load and preprocess the dataset
Apply K-means clustering
Visualize clustering results
Use OOP for modular design

Dataset Description

🧬 Breast Cancer Dataset (sklearn)

The dataset contains 30 numerical features describing cell nucleus characteristics, such as:

Mean radius
Texture
Smoothness
Compactness

Each sample is labeled as either benign or malignant.

Code Walkthrough

1️⃣ Import Required Libraries

import matplotlib.pyplot as plt
from sklearn.cluster import KMeans
from sklearn.datasets import load_breast_cancer
from classInheritanceBreastCancer import Cell
from classCompositionBreastCancer import DataProcessor

This setup separates responsibilities:

Cell: inheritance-based representation of cell data
DataProcessor: composition-based preprocessing handler

2️⃣ Load the Dataset

breast_cancer = load_breast_cancer()

This loads the feature matrix and labels from sklearn.

3️⃣ Preprocess the Data (Scaling)

processor = DataProcessor(breast_cancer)
X_scaled, y = processor.preprocess_data()

Scaling is critical because K-means is sensitive to feature magnitude. Standardization ensures equal contribution from each feature.

4️⃣ Apply K-Means Clustering

kmeans = KMeans(n_clusters=2, random_state=42)
cluster_labels = kmeans.fit_predict(X_scaled)

The algorithm assigns each data point to one of two clusters by minimizing within-cluster variance.

5️⃣ Visualize the Clusters

plt.figure(figsize=(10, 6))
plt.scatter(X_scaled[:, 0], X_scaled[:, 1],
            c=cluster_labels, cmap='viridis')
plt.xlabel('Feature 1')
plt.ylabel('Feature 2')
plt.title('K-means Clustering of Cell Features')
plt.colorbar(label='Cluster')
plt.show()

Only the first two features are used for visualization, even though clustering uses all features.

Explanation of the Solution

🔍 Data Preprocessing

Feature scaling ensures that no single feature dominates distance calculations in K-means clustering.

🧠 K-Means Clustering

K-means partitions data into clusters by minimizing the distance between data points and their respective cluster centroids.

📊 Visualization

The scatter plot provides intuitive insight into how the algorithm groups samples based on similarity.

Plot Interpretation

Each point represents a breast cancer cell sample
Colors indicate cluster assignment
Clusters approximate benign vs malignant separation
Perfect separation is not guaranteed in 2D views

💡 Key Takeaways

K-means is useful for exploratory pattern discovery
Scaling is essential for distance-based algorithms
OOP improves modularity and readability
Visualization helps interpret clustering quality
Unsupervised results may not perfectly match labels

Yet Another Data Science Blog

Pages

Thursday, December 19, 2024

K-Means Clustering Analysis on Breast Cancer Dataset Using Object-Oriented Design

K-Means Clustering on the Breast Cancer Dataset

Problem Overview

Dataset Description

Code Walkthrough

Explanation of the Solution

Plot Interpretation

💡 Key Takeaways

No comments:

Post a Comment

Featured Post

How HMT Watches Lost the Time: A Deep Dive into Disruptive Innovation Blindness in Indian Manufacturing

Popular Posts

Posts Per Category

🎮 AI Fun Zone

🧠 AI Quiz

🎯 Guess Game

⚡ Speed Test

✊ Rock Paper Scissors

🔢 Quick Math

🧩 Memory Game

⌨️ Typing Speed

🟥 Color Click

🎲 Dice Game

Explore AI Hub

Latest Posts

AI Category

🚀 Trending AI Projects

📊 Data Science Resources

📚 Latest Research Papers

🔥 New AI Tools

💬 Developer Discussions

Contact Form

Followers