Friday, November 29, 2024

Few-Shot and Zero-Shot Learning in Computer Vision: Teaching AI with Minimal Data

Few-Shot vs Zero-Shot Learning – Complete Beginner Friendly Guide

🧠 Few-Shot vs Zero-Shot Learning – Learn AI Like a Human

Imagine teaching a child to recognize animals. Show them one giraffe, and they recognize many. Describe a unicorn, and they can identify it without ever seeing one.

This is exactly how few-shot and zero-shot learning work in AI.

📸 What Is Few-Shot Learning?

Few-shot learning means learning from very few examples.

Example: Showing just 2–3 panda images and still recognizing pandas later.

Uses existing knowledge
Works with limited data
Generalizes quickly

🦄 What Is Zero-Shot Learning?

Zero-shot learning means recognizing something without seeing it before.

Example: “A horse with a horn” → identifying a unicorn without training images.

No training examples needed
Uses descriptions
Relies on understanding relationships

📐 Math Explained in Easy Language

1. Distance Measurement (Few-Shot)

\[ Distance = \sqrt{(x_1 - x_2)^2 + (y_1 - y_2)^2} \]

Explanation:

This calculates how similar two images are.

Small distance → very similar
Large distance → very different

Think of it like comparing faces—closer features mean same person.

2. Probability Prediction

\[ P(class|image) \]

This means: “What is the probability this image belongs to a class?”

3. Softmax Function

\[ Softmax(x_i) = \frac{e^{x_i}}{\sum e^{x_j}} \]

👉 Converts scores into probabilities.

Higher score = higher chance of being correct.

⚙️ How These Models Work

Few-Shot Learning

Learn general features
Create class prototypes
Compare new images to prototypes

Zero-Shot Learning

Convert text → numbers
Convert images → numbers
Match both in same space

💻 Code Example


from transformers import CLIPProcessor, CLIPModel
from PIL import Image

model = CLIPModel.from_pretrained("openai/clip-vit-base-patch32")
processor = CLIPProcessor.from_pretrained("openai/clip-vit-base-patch32")

image = Image.open("image.jpg")

inputs = processor(text=["a cat", "a dog"], images=image, return_tensors="pt")
outputs = model(**inputs)

print(outputs.logits_per_image)

🖥️ CLI Output

View Output

Input Image: animal.jpg
Predictions:
a cat: 0.12
a dog: 0.88