### Simple Example
Suppose you're a student and want to know if studying more hours leads to better grades. You collect data from several students:
- **Student A:** Studied 2 hours, got 70%
- **Student B:** Studied 4 hours, got 80%
- **Student C:** Studied 6 hours, got 90%
You want to find a line that best fits these points so you can predict the grade for any given number of study hours.
### The Goal
OLS seeks to find the line `y = mx + b`, where:
- `y` is the grade (dependent variable)
- `x` is the number of study hours (independent variable)
- `m` is the slope of the line (indicating how much the grade increases for each additional hour of study)
- `b` is the y-intercept (the predicted grade when no hours are studied)
### How OLS Works
OLS finds the values of `m` and `b` that minimize the **sum of the squared differences** between the actual grades and the grades predicted by the line. These differences are called "residuals."
For each student, the residual is:
Residual = y_actual - y_predicted
OLS minimizes the sum of the squares of these residuals:
Sum of Squared Residuals = Σ(y_actual - y_predicted)²
### OLS Formula
For a simple linear regression with one independent variable `x`, the formulas to calculate `m` and `b` are:
m = [n(Σxy) - (Σx)(Σy)] / [n(Σx²) - (Σx)²]
b = [(Σy)(Σx²) - (Σx)(Σxy)] / [n(Σx²) - (Σx)²]
Here, `n` is the number of data points.
### Conclusion
Once you have `m` and `b`, you can plug in any value of `x` (hours studied) to predict `y` (the grade).
In summary, OLS helps you find the line that best fits your data by minimizing the distance between the actual data points and the predicted points on the line. This line can then be used to make predictions.
No comments:
Post a Comment