Showing posts with label RANSAC. Show all posts
Showing posts with label RANSAC. Show all posts

Sunday, December 29, 2024

Housing Prices vs. Average Number of Rooms: Inliers and Outliers Analysis with RANSAC

The task involves analyzing a dataset that contains information about housing prices and certain features related to the housing market. Specifically, the dataset provides the average number of rooms per dwelling and the median value of owner-occupied homes in $1000's. The goal is to identify relationships between these variables, while addressing the presence of data anomalies, such as outliers.

The dataset contains a mix of *inliers* (data points that fit the general trend or pattern) and *outliers* (data points that deviate significantly from the expected pattern). The presence of outliers can skew the results of any predictive modeling or analysis. Therefore, the aim is to visualize and model the relationship between the number of rooms and the home prices while excluding these outliers to get a more accurate model of the underlying trend.

### Solution:

In the solution, a scatter plot is used to visualize the data, with different markers for inliers and outliers:

- **Inliers**: These are the data points that follow the general trend of the relationship between the number of rooms and the home prices. They are shown as blue circles on the plot.
  
- **Outliers**: These are the data points that do not follow the expected pattern and are significantly different from the inliers. They are represented as brown squares.

A **RANSAC (Random Sample Consensus)** regression line is plotted in red on the graph. RANSAC is a robust method for fitting a model to data that may contain outliers. It helps identify the best fit line that excludes outliers, thereby providing a more accurate representation of the underlying relationship between the two variables (average number of rooms and median home price).

The plot clearly shows the main trend of home prices increasing with the number of rooms, while also distinguishing between valid data points (inliers) and those that do not fit the pattern (outliers). The red line represents the model derived from the inliers, which is less influenced by the outliers, resulting in a more reliable analysis of the relationship between the variables.

### Summary:
- The plot highlights the relationship between the average number of rooms and the median home price.
- Outliers are identified and differentiated from the inliers.
- The RANSAC regression line offers a robust fit to the data, ensuring that the relationship between rooms and home prices is accurately modeled despite the presence of outliers.

Wednesday, November 13, 2024

How Line Fitting Works in Image Processing and Computer Vision


Line Fitting in Computer Vision | Least Squares & RANSAC Explained

Line Fitting in Computer Vision (Complete Guide)

๐Ÿ“Œ Table of Contents


Introduction

If you've ever drawn a straight line through scattered points, you've already performed line fitting. In computer vision, this concept allows machines to detect structure in visual data.

๐Ÿ’ก Line fitting helps computers convert noisy visual data into meaningful structure.

What is Line Fitting?

Line fitting is the process of finding a line that best represents a group of data points. These points may not align perfectly due to noise or measurement errors.

  • Sensor noise
  • Lighting variation
  • Measurement errors
  • Environmental disturbances

๐Ÿ“Š Mathematics of Line Fitting

The equation of a line is:

$$ y = mx + b $$

Where:

  • \( m \) = slope
  • \( b \) = intercept

The goal is to minimize error between actual and predicted values.

Error Function

$$ E = \sum_{i=1}^{N} (y_i - (mx_i + b))^2 $$

This is called the least squares error.


Least Squares Method

This method minimizes the squared error between points and the fitted line.

Formula for Slope

$$ m = \frac{N\sum xy - \sum x \sum y}{N\sum x^2 - (\sum x)^2} $$

Formula for Intercept

$$ b = \frac{\sum y - m\sum x}{N} $$

These formulas ensure the best statistical fit.

Squaring ensures:

  • No negative cancellation
  • Penalizes large errors more
  • Smooth optimization function

RANSAC Method

RANSAC is used when data contains outliers.

Mathematical Idea

Instead of minimizing all errors, RANSAC maximizes inliers:

$$ \text{Maximize} \quad |\{i : |y_i - (mx_i + b)| < \epsilon \}| $$

Where \( \epsilon \) is a tolerance threshold.

  1. Select random subset
  2. Fit model
  3. Count inliers
  4. Repeat
  5. Choose best model

๐Ÿ’ป Practical Example (Python)

Code Example

import numpy as np x = np.array([1,2,3,4,5]) y = np.array([2,4,5,4,5]) m = (len(x)*np.sum(x*y) - np.sum(x)*np.sum(y)) / (len(x)*np.sum(x*x) - (np.sum(x))**2) b = (np.sum(y) - m*np.sum(x)) / len(x) print("Slope:", m) print("Intercept:", b)

Output

Slope: 0.8 Intercept: 2.2

Applications

  • Self-driving cars (lane detection)
  • Edge detection
  • Robotics navigation
  • Medical imaging
  • Augmented reality

๐ŸŽฏ Key Takeaways

  • Line fitting extracts structure from noisy data
  • Least squares minimizes error globally
  • RANSAC handles outliers effectively
  • Math ensures optimal fitting

Conclusion

Line fitting is a fundamental concept bridging mathematics and computer vision. It allows machines to interpret visual data efficiently and reliably.

Whether using least squares or RANSAC, understanding the math behind the method gives deeper insight into how machines "see" the world.

Featured Post

How HMT Watches Lost the Time: A Deep Dive into Disruptive Innovation Blindness in Indian Manufacturing

The Rise and Fall of HMT Watches: A Story of Brand Dominance and Disruptive Innovation Blindness The Rise and Fal...

Popular Posts