K-Nearest Neighbors (KNN) & Euclidean Distance
๐ Table of Contents
๐ Introduction
KNN is a simple algorithm that classifies data based on similarity.
๐ Euclidean Distance
Formula:
\[ d = \sqrt{(x_2 - x_1)^2 + (y_2 - y_1)^2} \]
This represents the straight-line distance between two points.
๐ Mathematical Deep Dive
This comes from the Pythagorean theorem:
\[ a^2 + b^2 = c^2 \]
- a = horizontal difference
- b = vertical difference
- c = distance
\[ d = \sqrt{(x_2 - x_1)^2 + (y_2 - y_1)^2} \]
๐ฝ Why squaring?
It removes negative values and ensures distance is always positive.
๐ Worked Example
- A (1,2)
- B (2,3)
- C (4,5)
- New (3,3)
\[ d(A) = \sqrt{(3-1)^2 + (3-2)^2} = \sqrt{5} \approx 2.23 \]
\[ d(B) = 1 \]
\[ d(C) = \sqrt{5} \approx 2.23 \]
๐ค KNN Classification
- Closest: B
- Second: A
Prediction: Class 1
๐ฆ Higher Dimensions
\[ d = \sqrt{\sum_{i=1}^{n}(x_i - y_i)^2} \]
This allows KNN to work with multiple features.
๐ป CLI Example
Code
import numpy as np
def distance(p1, p2):
return np.sqrt(np.sum((np.array(p1) - np.array(p2))**2))
print(distance([3,3],[1,2]))
print(distance([3,3],[2,3]))
print(distance([3,3],[4,5]))
Output
$ python knn.py 2.23 1.0 2.23
๐ฝ Explanation
Computes squared differences → sums → square root.
๐ฏ Key Takeaways
- Distance = similarity measure
- KNN uses nearest neighbors
- Works in any dimension
- Based on geometry
๐ Final Thoughts
Euclidean distance converts data into measurable relationships, making machine learning possible.
No comments:
Post a Comment