Tuesday, August 27, 2024

How Derivatives Help Optimize Linear Regression Models

Linear Regression — Complete Deep Learning Guide

📘 Linear Regression — Full Concept + Math + Intuition

📑 Table of Contents

📌 What is Linear Regression?

Linear Regression is a statistical and machine learning technique used to model the relationship between variables.

It tries to answer a simple question: "Can we predict output (y) using input (x)?"

The model assumes a linear relationship:

ŷ = β0 + β1x
  • β0 → Intercept (value when x = 0)
  • β1 → Slope (how much y changes when x changes)

❓ Why Do We Need Linear Regression?

In real life, relationships exist everywhere:

  • Hours studied → Marks scored
  • Ad spend → Sales
  • Experience → Salary

Linear regression helps us quantify and predict these relationships.

🧠 Deep Intuition

Click to expand

Imagine plotting points on a graph. There are infinite lines you could draw.

But we want the "best" line.

Best means:

  • Closest to all points
  • Minimum total error

Instead of guessing, we use math to find this optimal line.

📊 Dataset

xy
12
23

📉 Residual Sum of Squares (RSS)

Residual = Actual - Predicted

RSS measures total squared error.

RSS = (2 - (β0 + β1*1))^2 + (3 - (β0 + β1*2))^2

Why square?

  • Avoid negative cancellation
  • Penalize large errors more

📐 Full Step-by-Step Derivation (Deep Explanation)

Expand Full Math with Explanation

Step 1: Start with RSS

RSS = (2 - β0 - β1)^2 + (3 - β0 - 2β1)^2

Step 2: Expand each term

(2 - β0 - β1)^2 = (2 - β0 - β1)(2 - β0 - β1)
= 4 - 4β0 - 4β1 + β0^2 + 2β0β1 + β1^2

(3 - β0 - 2β1)^2 = (3 - β0 - 2β1)(3 - β0 - 2β1)
= 9 - 6β0 - 12β1 + β0^2 + 4β0β1 + 4β1^2

Step 3: Add both expressions

RSS = (4 + 9)
      + (β0^2 + β0^2)
      + (β1^2 + 4β1^2)
      + (2β0β1 + 4β0β1)
      + (-4β0 - 6β0)
      + (-4β1 - 12β1)

RSS = 13 + 2β0^2 + 5β1^2 + 6β0β1 -10β0 -16β1

Step 4: Take derivative w.r.t β0

d(RSS)/dβ0 = d/dβ0 (2β0^2 + 6β0β1 -10β0)
= 4β0 + 6β1 -10

Step 5: Take derivative w.r.t β1

d(RSS)/dβ1 = d/dβ1 (5β1^2 + 6β0β1 -16β1)
= 10β1 + 6β0 -16

Step 6: Set derivatives to zero

4β0 + 6β1 = 10
6β0 + 10β1 = 16

Step 7: Solve using elimination

Multiply first equation by 3:
12β0 + 18β1 = 30

Multiply second equation by 2:
12β0 + 20β1 = 32

Subtract:
(12β0 + 20β1) - (12β0 + 18β1) = 32 - 30
2β1 = 2
β1 = 1

Substitute into first equation:
4β0 + 6(1) = 10
4β0 + 6 = 10
4β0 = 4
β0 = 1

Final Result:

β0 = 1
β1 = 1

ŷ = x + 1

🧮 Solving Equations

Set derivatives = 0 to find minimum:

4β0 + 6β1 = 10
6β0 + 10β1 = 16

Solving gives:

β0 = 1
β1 = 1

Final Model:

ŷ = x + 1

💻 Code Example

import numpy as np
from sklearn.linear_model import LinearRegression

X = np.array([1,2]).reshape(-1,1)
y = np.array([2,3])

model = LinearRegression()
model.fit(X,y)

print(model.intercept_)
print(model.coef_)

🖥 CLI Output

1.0
[1.0]

💡 Key Takeaways

  • Linear regression models relationships
  • RSS measures error
  • Derivatives minimize error
  • Gives best-fit line mathematically

🔗 Related Articles

No comments:

Post a Comment

Featured Post

How HMT Watches Lost the Time: A Deep Dive into Disruptive Innovation Blindness in Indian Manufacturing

The Rise and Fall of HMT Watches: A Story of Brand Dominance and Disruptive Innovation Blindness The Rise and Fal...

Popular Posts