Tuesday, October 29, 2024

How Image Formation Works in Computer Vision: A Beginner’s Guide


Image Formation in Computer Vision – Complete Beginner Guide

๐Ÿ“ท Image Formation in Computer Vision (Explained Simply)

Computer vision is about teaching machines to “see.” But before a computer can understand images, it must first learn how images are formed.

This guide explains the entire process—from light to pixels—in a simple, structured way.


๐Ÿ“š Table of Contents


๐Ÿง  What is Image Formation?

Image formation is the process where light from a scene is captured and converted into a digital image.

Human eye → Brain processes light Camera → Sensor processes light into pixels

Both systems work similarly: they convert light into interpretable information.


๐ŸŒŸ Step 1: Light and the Scene

Everything starts with light.

  • Light hits objects
  • Objects reflect light
  • Reflected light enters camera

The intensity and color of light determine what the image looks like.

If we represent light intensity mathematically:

\[ I(x, y) = \text{light intensity at pixel (x, y)} \]

This means every pixel stores brightness information.


๐Ÿ” Step 2: Camera Lens

The lens focuses light onto the sensor.

Without a lens → blurry image With proper lens → sharp image

Refraction (Light Bending)

Light bends when passing through the lens.

This bending helps all rays meet at a point called the focal point.


๐Ÿ“ก Step 3: Image Sensor

The sensor is made of millions of pixels.

Each pixel measures light intensity.

Pixel Function:

\[ Pixel = f(\text{incoming light intensity}) \]

So the image becomes a grid of numbers (matrix).

Example:

[[12, 45, 78], [34, 90, 120], [10, 60, 200]]

This matrix is what a computer actually sees.


๐Ÿ“‰ Step 4: 3D → 2D Projection

A real-world scene is 3D, but images are 2D.

This conversion is called projection.

Mathematically:

\[ (x, y, z) \rightarrow (x', y') \]

Simple Explanation:

A shadow of a ball is 2D, but the ball is 3D.

Same idea applies to cameras.


๐Ÿ“ Math Behind Image Formation

1. Pinhole Camera Model

\[ x' = f \cdot \frac{x}{z}, \quad y' = f \cdot \frac{y}{z} \]

Easy Explanation:

  • \(x, y, z\) = real-world coordinates
  • \(f\) = focal length
  • \(x', y'\) = image coordinates

๐Ÿ‘‰ Objects farther away (large z) appear smaller.


2. Light Intensity Model

\[ I = L \cdot R \]

  • L = light source
  • R = reflection from object

๐Ÿ‘‰ Brightness depends on both light and surface.


๐Ÿ“Š Key Concepts

๐Ÿ“ Focal Length

Controls zoom level of the camera.

๐Ÿ‘️ Field of View

How much of the scene is visible.

๐Ÿ’ก Aperture

Controls light entering the camera.

๐ŸŒ— Depth of Field

Range of sharp focus in image.

Shallow depth → blurred background Deep depth → everything sharp

⚙️ Putting It All Together

  1. Light reflects from objects
  2. Lens focuses light
  3. Sensor captures light as pixels
  4. 3D world becomes 2D image

The final output is a matrix of numbers that represents an image.


๐Ÿ’ก Key Takeaways

  • Images are made of light information
  • Cameras convert light into digital pixels
  • Mathematics helps map 3D → 2D
  • Every image is just a matrix of numbers

๐ŸŽฏ Final Insight

Image formation is the foundation of computer vision. Without it, AI systems would not be able to interpret the world visually.

Understanding this process helps in areas like:

  • Autonomous driving ๐Ÿš—
  • Facial recognition ๐Ÿ˜Š
  • Medical imaging ๐Ÿฅ
  • Robotics ๐Ÿค–

No comments:

Post a Comment

Featured Post

How HMT Watches Lost the Time: A Deep Dive into Disruptive Innovation Blindness in Indian Manufacturing

The Rise and Fall of HMT Watches: A Story of Brand Dominance and Disruptive Innovation Blindness The Rise and Fal...

Popular Posts