Showing posts with label formula. Show all posts
Showing posts with label formula. Show all posts

Wednesday, August 28, 2024

How OLS Regression Works: Simple Explanation with Example

Ordinary Least Squares (OLS) is a method used in statistics to find the best-fitting line through a set of data points. This line is known as the "regression line," and it helps predict the value of a dependent variable (denoted as `y`) based on the value of an independent variable (denoted as `x`).

### Simple Example

Suppose you're a student and want to know if studying more hours leads to better grades. You collect data from several students:

- **Student A:** Studied 2 hours, got 70%
- **Student B:** Studied 4 hours, got 80%
- **Student C:** Studied 6 hours, got 90%

You want to find a line that best fits these points so you can predict the grade for any given number of study hours.

### The Goal

OLS seeks to find the line `y = mx + b`, where:
- `y` is the grade (dependent variable)
- `x` is the number of study hours (independent variable)
- `m` is the slope of the line (indicating how much the grade increases for each additional hour of study)
- `b` is the y-intercept (the predicted grade when no hours are studied)

### How OLS Works

OLS finds the values of `m` and `b` that minimize the **sum of the squared differences** between the actual grades and the grades predicted by the line. These differences are called "residuals."

For each student, the residual is:

Residual = y_actual - y_predicted


OLS minimizes the sum of the squares of these residuals:

Sum of Squared Residuals = Σ(y_actual - y_predicted)²


### OLS Formula

For a simple linear regression with one independent variable `x`, the formulas to calculate `m` and `b` are:


m = [n(Σxy) - (Σx)(Σy)] / [n(Σx²) - (Σx)²]



b = [(Σy)(Σx²) - (Σx)(Σxy)] / [n(Σx²) - (Σx)²]


Here, `n` is the number of data points.

### Conclusion

Once you have `m` and `b`, you can plug in any value of `x` (hours studied) to predict `y` (the grade).

In summary, OLS helps you find the line that best fits your data by minimizing the distance between the actual data points and the predicted points on the line. This line can then be used to make predictions.

Featured Post

How HMT Watches Lost the Time: A Deep Dive into Disruptive Innovation Blindness in Indian Manufacturing

The Rise and Fall of HMT Watches: A Story of Brand Dominance and Disruptive Innovation Blindness The Rise and Fal...

Popular Posts