Tuesday, August 27, 2024

Residuals and RSS in Linear Regression

Understanding Residuals and RSS in Linear Regression

๐Ÿ“Š Understanding Residuals and RSS in Linear Regression

๐Ÿ“– Introduction

Linear regression helps us understand relationships between variables. But how do we measure how good our predictions are?

That’s where residuals and RSS (Residual Sum of Squares) come in.

๐Ÿ’ก Residual = Actual Value − Predicted Value

๐Ÿ“Š Dataset

Hours Studied (x)Actual Score (y)
250
460
665
880

We want to predict how study hours affect scores.

๐Ÿ“ˆ Linear Regression Model

Our model:

ลท = 5x + 40

This means: - For every extra hour studied, score increases by 5 - Base score starts at 40

๐Ÿ”ฝ Expand: Why linear model?

Linear regression assumes a straight-line relationship between variables. It is simple, interpretable, and often effective for small datasets.

✅ Step 1: Calculate Predictions

ลท₁ = 5(2) + 40 = 50
ลท₂ = 5(4) + 40 = 60
ลท₃ = 5(6) + 40 = 70
ลท₄ = 5(8) + 40 = 80

We now have predicted values for each data point.

๐Ÿ“‰ Step 2: Calculate Residuals

Residual₁ = 50 - 50 = 0
Residual₂ = 60 - 60 = 0
Residual₃ = 65 - 70 = -5
Residual₄ = 80 - 80 = 0

Residuals tell us how far off each prediction is.

๐Ÿ”ฝ Expand: Why negative residual?

A negative residual means the model overestimated the value.

๐Ÿ”ข Step 3: Square the Residuals

0² = 0
0² = 0
(-5)² = 25
0² = 0

Squaring removes negative signs and penalizes larger errors.

๐Ÿ“Œ Step 4: Calculate RSS

RSS = 0 + 0 + 25 + 0 = 25
๐ŸŽฏ RSS measures total prediction error. Lower = better fit.

๐Ÿ“Š Mathematical Insight

The RSS formula is:

RSS = ฮฃ (y - ลท)²

This sums all squared differences between actual and predicted values.

๐Ÿ“ Mathematical Explanation of Residuals and RSS

In linear regression, we quantify error using residuals and RSS.

Residual Definition

The residual for each data point is:

\[ e_i = y_i - \hat{y}_i \]

Where:

  • \( y_i \): actual value
  • \( \hat{y}_i \): predicted value
  • \( e_i \): residual (error)

Residual Sum of Squares (RSS)

The total error across all observations is:

\[ RSS = \sum_{i=1}^{n} (y_i - \hat{y}_i)^2 \]

Applying to Our Example

\[ RSS = (50 - 50)^2 + (60 - 60)^2 + (65 - 70)^2 + (80 - 80)^2 \]

\[ RSS = 0 + 0 + 25 + 0 = 25 \]

Why Squaring?

  • Prevents positive and negative errors from canceling out
  • Penalizes larger errors more strongly
  • Makes optimization mathematically convenient
๐Ÿ’ก The goal of regression is to minimize RSS, leading to the best-fitting line.

๐Ÿ’ป CLI Implementation Example

Code Example

x = [2,4,6,8]
y = [50,60,65,80]

def predict(x):
    return 5*x + 40

rss = 0

for i in range(len(x)):
    y_hat = predict(x[i])
    residual = y[i] - y_hat
    rss += residual**2

print("RSS:", rss)

CLI Output

$ python regression.py
RSS: 25
๐Ÿ”ฝ Expand CLI Explanation

The script loops through each data point, computes residuals, squares them, and sums them.

๐ŸŽฏ Key Takeaways

  • Residuals measure prediction error
  • Negative residual = overestimation
  • Squaring ensures all errors are positive
  • RSS summarizes total model error
  • Lower RSS = better model performance

๐Ÿ“˜ Final Thoughts

Residuals and RSS form the foundation of machine learning evaluation. Understanding them deeply will help you build better predictive models.

No comments:

Post a Comment

Featured Post

How HMT Watches Lost the Time: A Deep Dive into Disruptive Innovation Blindness in Indian Manufacturing

The Rise and Fall of HMT Watches: A Story of Brand Dominance and Disruptive Innovation Blindness The Rise and Fal...

Popular Posts