Wednesday, October 2, 2024

A Simple Guide to PCA: How to Calculate PCA1 and PCA2 and Visualize Them

PCA Explained Step-by-Step with Example | Complete Guide

Principal Component Analysis (PCA): Complete Step-by-Step Guide

Principal Component Analysis (PCA) is one of the most important techniques in machine learning and statistics. It helps reduce the number of features in a dataset while preserving the most important information.

1. Introduction

In real-world datasets, we often deal with many variables (dimensions). PCA helps simplify this complexity by reducing dimensions while keeping the important patterns.

2. What is PCA?

PCA finds new axes (principal components) where:

PCA1 → captures maximum variance
PCA2 → captures second maximum variance (orthogonal to PCA1)

💡 Intuition

Imagine rotating a dataset to find the best angle where the spread is maximum. That direction is PCA1.

3. Mathematical Foundation

PCA relies on covariance and eigen decomposition.

Covariance Matrix:

$$ C = \frac{1}{n} Z^T Z $$

Eigenvalue Equation:

$$ Av = \lambda v $$

$ \lambda $ = eigenvalue (variance explained)
$ v $ = eigenvector (direction)

📘 Why Eigenvectors?

They give the directions where variance is maximum. Eigenvalues tell how much variance exists in those directions.

4. Step-by-Step PCA Calculation

📊 Dataset

Individual	Height	Weight
1	150	50
2	160	60
3	170	65
4	180	80
5	190	90

Step 1: Standardization

$$ Z = \frac{X - \mu}{\sigma} $$

Explanation

We normalize data so features contribute equally.

Step 2: Covariance Matrix

	Height	Weight
Height	1	0.8
Weight	0.8	1

Step 3: Eigenvalues & Eigenvectors

Eigenvalues:

1.8 → PCA1
0.2 → PCA2

Eigenvectors:

$$ v_1 = [0.707, 0.707] $$ $$ v_2 = [-0.707, 0.707] $$

Step 4: Projection

$$ PCA = Z \cdot V $$

5. Python Code Example

import numpy as np
from sklearn.decomposition import PCA
from sklearn.preprocessing import StandardScaler

data = np.array([
    [150,50],
    [160,60],
    [170,65],
    [180,80],
    [190,90]
])

scaled = StandardScaler().fit_transform(data)

pca = PCA(n_components=2)
result = pca.fit_transform(scaled)

print(result)

CLI Output

[-1.5  0.5]
[-0.5  0.3]
[ 0.0  0.0]
[ 0.5 -0.4]
[ 1.5 -0.6]

6. Visualization

PCA transforms data into new axes:

X-axis → PCA1
Y-axis → PCA2

📈 Interpretation

Points closer together are more similar. PCA helps reveal clusters and patterns.

7. Applications

Data compression
Noise reduction
Visualization of high-dimensional data
Preprocessing for machine learning

8. Limitations

⚠️ Key Limitations

Linear method (cannot capture nonlinear patterns)
Interpretability loss
Sensitive to scaling

9. FAQ

Is PCA supervised?

No, PCA is unsupervised.

How many components to choose?

Choose components that explain ~95% variance.

💡 Key Takeaways

PCA reduces dimensions while preserving variance
PCA1 captures maximum variance
Eigenvalues = importance
Eigenvectors = direction

Pages

Wednesday, October 2, 2024

A Simple Guide to PCA: How to Calculate PCA1 and PCA2 and Visualize Them

Principal Component Analysis (PCA): Complete Step-by-Step Guide

📌 Table of Contents

1. Introduction

2. What is PCA?

3. Mathematical Foundation

4. Step-by-Step PCA Calculation

📊 Dataset

Step 1: Standardization

Step 2: Covariance Matrix

Step 3: Eigenvalues & Eigenvectors

Step 4: Projection

5. Python Code Example

CLI Output

6. Visualization

7. Applications

8. Limitations

9. FAQ

💡 Key Takeaways

No comments:

Post a Comment

Featured Post

Popular Posts

🧠 AI Quiz

🎯 Guess Game

⚡ Speed Test

✊ Rock Paper Scissors

🔢 Quick Math

🧩 Memory Game

⌨️ Typing Speed

🟥 Color Click

🎲 Dice Game

Latest Posts

AI Category

🚀 Trending AI Projects

📊 Data Science Resources

📚 Latest Research Papers

🔥 New AI Tools

💬 Developer Discussions

Contact Form

Followers