This blog explores data science and networking, combining theoretical concepts with practical implementations. Topics include routing protocols, network operations, and data-driven problem solving, presented with clarity and reproducibility in mind.
Thursday, December 5, 2024
What is ZCA Whitening? A Simple Explanation for Everyone
Wednesday, October 2, 2024
A Simple Guide to PCA: How to Calculate PCA1 and PCA2 and Visualize Them
Principal Component Analysis (PCA): Complete Step-by-Step Guide
Principal Component Analysis (PCA) is one of the most important techniques in machine learning and statistics. It helps reduce the number of features in a dataset while preserving the most important information.
๐ Table of Contents
- Introduction
- What is PCA?
- Mathematical Intuition
- Step-by-Step PCA Calculation
- Python CLI Example
- Visualization
- Applications
- Limitations
- FAQ
1. Introduction
In real-world datasets, we often deal with many variables (dimensions). PCA helps simplify this complexity by reducing dimensions while keeping the important patterns.
2. What is PCA?
PCA finds new axes (principal components) where:
- PCA1 → captures maximum variance
- PCA2 → captures second maximum variance (orthogonal to PCA1)
๐ก Intuition
Imagine rotating a dataset to find the best angle where the spread is maximum. That direction is PCA1.
3. Mathematical Foundation
PCA relies on covariance and eigen decomposition.
Covariance Matrix:
$$ C = \frac{1}{n} Z^T Z $$
Eigenvalue Equation:
$$ Av = \lambda v $$
- \( \lambda \) = eigenvalue (variance explained)
- \( v \) = eigenvector (direction)
๐ Why Eigenvectors?
They give the directions where variance is maximum. Eigenvalues tell how much variance exists in those directions.
4. Step-by-Step PCA Calculation
๐ Dataset
| Individual | Height | Weight |
|---|---|---|
| 1 | 150 | 50 |
| 2 | 160 | 60 |
| 3 | 170 | 65 |
| 4 | 180 | 80 |
| 5 | 190 | 90 |
Step 1: Standardization
$$ Z = \frac{X - \mu}{\sigma} $$
Explanation
We normalize data so features contribute equally.
Step 2: Covariance Matrix
| Height | Weight | |
|---|---|---|
| Height | 1 | 0.8 |
| Weight | 0.8 | 1 |
Step 3: Eigenvalues & Eigenvectors
Eigenvalues:
- 1.8 → PCA1
- 0.2 → PCA2
Eigenvectors:
$$ v_1 = [0.707, 0.707] $$ $$ v_2 = [-0.707, 0.707] $$
Step 4: Projection
$$ PCA = Z \cdot V $$
5. Python Code Example
import numpy as np
from sklearn.decomposition import PCA
from sklearn.preprocessing import StandardScaler
data = np.array([
[150,50],
[160,60],
[170,65],
[180,80],
[190,90]
])
scaled = StandardScaler().fit_transform(data)
pca = PCA(n_components=2)
result = pca.fit_transform(scaled)
print(result)
CLI Output
[-1.5 0.5] [-0.5 0.3] [ 0.0 0.0] [ 0.5 -0.4] [ 1.5 -0.6]
6. Visualization
PCA transforms data into new axes:
- X-axis → PCA1
- Y-axis → PCA2
๐ Interpretation
Points closer together are more similar. PCA helps reveal clusters and patterns.
7. Applications
- Data compression
- Noise reduction
- Visualization of high-dimensional data
- Preprocessing for machine learning
8. Limitations
⚠️ Key Limitations
- Linear method (cannot capture nonlinear patterns)
- Interpretability loss
- Sensitive to scaling
9. FAQ
Is PCA supervised?
No, PCA is unsupervised.
How many components to choose?
Choose components that explain ~95% variance.
๐ก Key Takeaways
- PCA reduces dimensions while preserving variance
- PCA1 captures maximum variance
- Eigenvalues = importance
- Eigenvectors = direction
Eigenvectors in PCA: A Simple Guide to Understanding Key Concepts
Featured Post
How HMT Watches Lost the Time: A Deep Dive into Disruptive Innovation Blindness in Indian Manufacturing
The Rise and Fall of HMT Watches: A Story of Brand Dominance and Disruptive Innovation Blindness The Rise and Fal...
Popular Posts
-
EIGRP Stub Routing In complex network environments, maintaining stability and efficienc...
-
Modern NTP Practices – Interactive Guide Modern NTP Practices – Interactive Guide Network Time Protocol (NTP)...
-
DeepID-Net and Def-Pooling Layer Explained | Interactive Guide DeepID-Net and Def-Pooling Layer Explaine...
-
GET VPN COOP Explained Simply: Key Server Redundancy Made Easy GET VPN COOP Explained (Simple + Practica...
-
Modern Cisco ASA Troubleshooting (Post-9.7) Modern Cisco ASA Troubleshooting (Post-9.7) With evolving netwo...
-
When Machine Learning Looks Right but Goes Wrong When Machine Learning Looks Right but Goes Wrong Picture a f...
-
Latent Space & Vector Arithmetic Explained | AI Image Transformations Latent Space & Vector Arit...
-
Process Synchronization – Interactive OS Guide Process Synchronization – Interactive Operating Systems Guide In an operati...
-
Event2Mind – Teaching Machines Human Intent and Emotion Event2Mind: Teaching Machines to Understand Human Intent...
-
Linear Regression vs Classification – Interactive Guide Linear Regression vs Classification – Interactive Theory Guide Line...