Wednesday, October 2, 2024

A Simple Guide to PCA: How to Calculate PCA1 and PCA2 and Visualize Them



PCA Explained Step-by-Step with Example | Complete Guide

Principal Component Analysis (PCA): Complete Step-by-Step Guide

Principal Component Analysis (PCA) is one of the most important techniques in machine learning and statistics. It helps reduce the number of features in a dataset while preserving the most important information.


๐Ÿ“Œ Table of Contents


1. Introduction

In real-world datasets, we often deal with many variables (dimensions). PCA helps simplify this complexity by reducing dimensions while keeping the important patterns.


2. What is PCA?

PCA finds new axes (principal components) where:

  • PCA1 → captures maximum variance
  • PCA2 → captures second maximum variance (orthogonal to PCA1)
๐Ÿ’ก Intuition

Imagine rotating a dataset to find the best angle where the spread is maximum. That direction is PCA1.


3. Mathematical Foundation

PCA relies on covariance and eigen decomposition.

Covariance Matrix:

$$ C = \frac{1}{n} Z^T Z $$

Eigenvalue Equation:

$$ Av = \lambda v $$

  • \( \lambda \) = eigenvalue (variance explained)
  • \( v \) = eigenvector (direction)
๐Ÿ“˜ Why Eigenvectors?

They give the directions where variance is maximum. Eigenvalues tell how much variance exists in those directions.


4. Step-by-Step PCA Calculation

๐Ÿ“Š Dataset

IndividualHeightWeight
115050
216060
317065
418080
519090

Step 1: Standardization

$$ Z = \frac{X - \mu}{\sigma} $$

Explanation

We normalize data so features contribute equally.

Step 2: Covariance Matrix

HeightWeight
Height10.8
Weight0.81

Step 3: Eigenvalues & Eigenvectors

Eigenvalues:

  • 1.8 → PCA1
  • 0.2 → PCA2

Eigenvectors:

$$ v_1 = [0.707, 0.707] $$ $$ v_2 = [-0.707, 0.707] $$

Step 4: Projection

$$ PCA = Z \cdot V $$

5. Python Code Example

import numpy as np
from sklearn.decomposition import PCA
from sklearn.preprocessing import StandardScaler

data = np.array([
    [150,50],
    [160,60],
    [170,65],
    [180,80],
    [190,90]
])

scaled = StandardScaler().fit_transform(data)

pca = PCA(n_components=2)
result = pca.fit_transform(scaled)

print(result)

CLI Output

[-1.5  0.5]
[-0.5  0.3]
[ 0.0  0.0]
[ 0.5 -0.4]
[ 1.5 -0.6]

6. Visualization

PCA transforms data into new axes:

  • X-axis → PCA1
  • Y-axis → PCA2
๐Ÿ“ˆ Interpretation

Points closer together are more similar. PCA helps reveal clusters and patterns.

7. Applications

  • Data compression
  • Noise reduction
  • Visualization of high-dimensional data
  • Preprocessing for machine learning

8. Limitations

⚠️ Key Limitations
  • Linear method (cannot capture nonlinear patterns)
  • Interpretability loss
  • Sensitive to scaling

9. FAQ

Is PCA supervised?

No, PCA is unsupervised.

How many components to choose?

Choose components that explain ~95% variance.

๐Ÿ’ก Key Takeaways

  • PCA reduces dimensions while preserving variance
  • PCA1 captures maximum variance
  • Eigenvalues = importance
  • Eigenvectors = direction

No comments:

Post a Comment

Featured Post

How HMT Watches Lost the Time: A Deep Dive into Disruptive Innovation Blindness in Indian Manufacturing

The Rise and Fall of HMT Watches: A Story of Brand Dominance and Disruptive Innovation Blindness The Rise and Fal...

Popular Posts