Thursday, September 19, 2024

How Hamming Distance Works in k-Nearest Neighbors Algorithms

Hamming Distance in k-NN | Deep Learning Tutorial

Understanding Hamming Distance in k-Nearest Neighbors (k-NN)

What is Hamming Distance?
Why Use Hamming Distance?
How Hamming Distance Works
Comparison with Other Distance Metrics
Hamming Distance in k-NN
Example: Classifying a New Point
Code Example & CLI Simulation
Key Takeaways
Related Articles

What is Hamming Distance?

Hamming distance measures the number of differences between two strings of equal length. If two strings have the same characters in the same positions, their Hamming distance is zero. If every character differs, the distance equals the string length.

💡 Key Point: Hamming distance is ideal for binary (0/1) and categorical data since it counts exact mismatches instead of computing geometric distances.

Why Use Hamming Distance?

In machine learning:

Binary or categorical features cannot be meaningfully measured with Euclidean distance.
Hamming distance provides a simple way to quantify dissimilarity between such features.
It allows classification algorithms like k-NN to work with non-numeric data effectively.

How Hamming Distance Works

Consider two binary strings:

String A: 101010
String B: 100100

Step-by-step comparison:

Position 1: 1 vs 1 → same
Position 2: 0 vs 0 → same
Position 3: 1 vs 0 → different
Position 4: 0 vs 1 → different
Position 5: 1 vs 0 → different
Position 6: 0 vs 0 → same

Total differences = 3 → Hamming distance = 3

Tip: You can visualize Hamming distance by marking differences with X:

101010
100100
  X  X X

Comparison with Other Distance Metrics

Different metrics suit different types of data:

Metric	Use Case	Notes
Hamming Distance	Binary/Categorical	Counts differing positions exactly
Euclidean Distance	Continuous numeric	Geometric straight-line distance
Manhattan Distance	Numeric, grid-like	Sum of absolute differences

Hamming Distance in k-Nearest Neighbors (k-NN)

The k-NN algorithm classifies a new point by comparing it to existing data points. Hamming distance identifies which points are most similar.

Step 1: Define the Data

Represent each object as a binary string, e.g., "101010". Each string encodes features such as Yes/No flags or categorical attributes.

Step 2: Calculate Distances

Compute Hamming distance from the new point to each dataset point.

Step 3: Find Nearest Neighbors

Sort all distances. Select the k smallest values to identify the nearest neighbors.

Step 4: Classify

Assign the class of the new point based on majority vote among the nearest neighbors.

Example: Classifying a New Point

Dataset:

Data Point	Class
1100	A
1010	B
1111	A
1001	B

New point: 1011

Hamming distances:

To 1100: 2
To 1010: 1
To 1111: 3
To 1001: 2

If k=2, nearest neighbors are "1010" (Class B) and "1100" (Class A). Since "1010" is closest, the new point is classified as Class B.

Code Example & CLI Simulation

Python implementation:

def hamming_distance(a, b):
    """Compute Hamming distance between two equal-length strings."""
    return sum(el1 != el2 for el1, el2 in zip(a, b))

data = [("1100", "A"), ("1010", "B"), ("1111", "A"), ("1001", "B")]
new_point = "1011"

# Compute distances
distances = [(point, cls, hamming_distance(new_point, point)) for point, cls in data]
distances.sort(key=lambda x: x[2])

# Choose k nearest neighbors
k = 2
nearest_neighbors = distances[:k]
print("Nearest Neighbors:", nearest_neighbors)

# Majority vote
from collections import Counter
vote = Counter(cls for _, cls, _ in nearest_neighbors)
classification = vote.most_common(1)[0][0]
print("Classified as:", classification)

CLI Output Simulation:

Nearest Neighbors: [('1010', 'B', 1), ('1100', 'A', 2)]
Classified as: B

Key Takeaways

Hamming distance counts differing positions between equal-length strings.
It is ideal for binary or categorical data, unlike Euclidean or Manhattan distance.
In k-NN, it identifies the nearest neighbors to classify new points effectively.
Majority voting among neighbors determines the predicted class.
Understanding distance metrics is critical for selecting the right algorithm for your data type.

Yet Another Data Science Blog

Pages

Thursday, September 19, 2024

How Hamming Distance Works in k-Nearest Neighbors Algorithms

Understanding Hamming Distance in k-Nearest Neighbors (k-NN)

Table of Contents

What is Hamming Distance?

Why Use Hamming Distance?

How Hamming Distance Works

Comparison with Other Distance Metrics

Hamming Distance in k-Nearest Neighbors (k-NN)

Example: Classifying a New Point

Code Example & CLI Simulation

Key Takeaways

No comments:

Post a Comment

Featured Post

How HMT Watches Lost the Time: A Deep Dive into Disruptive Innovation Blindness in Indian Manufacturing

Popular Posts

Posts Per Category

🎮 AI Fun Zone

🧠 AI Quiz

🎯 Guess Game

⚡ Speed Test

✊ Rock Paper Scissors

🔢 Quick Math

🧩 Memory Game

⌨️ Typing Speed

🟥 Color Click

🎲 Dice Game

Explore AI Hub

Latest Posts

AI Category

🚀 Trending AI Projects

📊 Data Science Resources

📚 Latest Research Papers

🔥 New AI Tools

💬 Developer Discussions

Contact Form

Followers

Pages

Thursday, September 19, 2024

How Hamming Distance Works in k-Nearest Neighbors Algorithms

Understanding Hamming Distance in k-Nearest Neighbors (k-NN)

Table of Contents

What is Hamming Distance?

Why Use Hamming Distance?

How Hamming Distance Works

Comparison with Other Distance Metrics

Hamming Distance in k-Nearest Neighbors (k-NN)

Example: Classifying a New Point

Code Example & CLI Simulation

Key Takeaways

Related Articles

No comments:

Post a Comment

Featured Post

Popular Posts

🧠 AI Quiz

🎯 Guess Game

⚡ Speed Test

✊ Rock Paper Scissors

🔢 Quick Math

🧩 Memory Game

⌨️ Typing Speed

🟥 Color Click

🎲 Dice Game

Latest Posts

AI Category

🚀 Trending AI Projects

📊 Data Science Resources

📚 Latest Research Papers

🔥 New AI Tools

💬 Developer Discussions

Contact Form

Followers