๐ Understanding Residuals and RSS in Linear Regression
๐ Table of Contents
๐ Introduction
Linear regression helps us understand relationships between variables. But how do we measure how good our predictions are?
That’s where residuals and RSS (Residual Sum of Squares) come in.
๐ Dataset
| Hours Studied (x) | Actual Score (y) |
|---|---|
| 2 | 50 |
| 4 | 60 |
| 6 | 65 |
| 8 | 80 |
We want to predict how study hours affect scores.
๐ Linear Regression Model
Our model:
ลท = 5x + 40
This means: - For every extra hour studied, score increases by 5 - Base score starts at 40
๐ฝ Expand: Why linear model?
Linear regression assumes a straight-line relationship between variables. It is simple, interpretable, and often effective for small datasets.
✅ Step 1: Calculate Predictions
ลท₁ = 5(2) + 40 = 50 ลท₂ = 5(4) + 40 = 60 ลท₃ = 5(6) + 40 = 70 ลท₄ = 5(8) + 40 = 80
We now have predicted values for each data point.
๐ Step 2: Calculate Residuals
Residual₁ = 50 - 50 = 0 Residual₂ = 60 - 60 = 0 Residual₃ = 65 - 70 = -5 Residual₄ = 80 - 80 = 0
Residuals tell us how far off each prediction is.
๐ฝ Expand: Why negative residual?
A negative residual means the model overestimated the value.
๐ข Step 3: Square the Residuals
0² = 0 0² = 0 (-5)² = 25 0² = 0
Squaring removes negative signs and penalizes larger errors.
๐ Step 4: Calculate RSS
RSS = 0 + 0 + 25 + 0 = 25
๐ Mathematical Insight
The RSS formula is:
RSS = ฮฃ (y - ลท)²
This sums all squared differences between actual and predicted values.
๐ Mathematical Explanation of Residuals and RSS
In linear regression, we quantify error using residuals and RSS.
Residual Definition
The residual for each data point is:
\[ e_i = y_i - \hat{y}_i \]
Where:
- \( y_i \): actual value
- \( \hat{y}_i \): predicted value
- \( e_i \): residual (error)
Residual Sum of Squares (RSS)
The total error across all observations is:
\[ RSS = \sum_{i=1}^{n} (y_i - \hat{y}_i)^2 \]
Applying to Our Example
\[ RSS = (50 - 50)^2 + (60 - 60)^2 + (65 - 70)^2 + (80 - 80)^2 \]
\[ RSS = 0 + 0 + 25 + 0 = 25 \]
Why Squaring?
- Prevents positive and negative errors from canceling out
- Penalizes larger errors more strongly
- Makes optimization mathematically convenient
๐ป CLI Implementation Example
Code Example
x = [2,4,6,8]
y = [50,60,65,80]
def predict(x):
return 5*x + 40
rss = 0
for i in range(len(x)):
y_hat = predict(x[i])
residual = y[i] - y_hat
rss += residual**2
print("RSS:", rss)
CLI Output
$ python regression.py RSS: 25
๐ฝ Expand CLI Explanation
The script loops through each data point, computes residuals, squares them, and sums them.
๐ฏ Key Takeaways
- Residuals measure prediction error
- Negative residual = overestimation
- Squaring ensures all errors are positive
- RSS summarizes total model error
- Lower RSS = better model performance
๐ Final Thoughts
Residuals and RSS form the foundation of machine learning evaluation. Understanding them deeply will help you build better predictive models.
No comments:
Post a Comment