Digital Image Foundations

How computers perceive images

2026-03-13

1. Digital Image Foundations

How computers store images

How We See: Light and The Human Eye

1. Illumination: Light from a source (e.g., the sun) hits an object.
2. Reflection: The object absorbs some colors and reflects others.
3. Capture: Reflected light enters the eye through the pupil.
4. Processing: The lens focuses light onto the retina. Photoreceptors (rods and cones) convert it to electrical signals for the brain.

How Computers “See”: The Digital Camera

1. Capture: Just like the eye, the camera captures reflected light.
2. Lens & Aperture: Light enters through an opening (aperture) and is focused by glass lenses.
3. The Sensor: Instead of a retina, light hits a digital sensor (CMOS/CCD).
4. Digitization: Millions of sensor pixels convert incoming photons into an electrical charge, which is translated into a digital matrix.

The Digital Sensor: Capturing Light

Photodiodes: Each pixel on a sensor forms a tiny bucket collecting photons (light).
Charge: More light \(\rightarrow\) more photons \(\rightarrow\) higher electrical charge.
Bayer Filter: Sensors capture only light intensity (grayscale). To get color, a microscopic mosaic filter is placed over them so each pixel records only Red, Green, or Blue.
Digitization: An Analog-to-Digital Converter (ADC) turns the electrical charge into a discrete number, typically from 0 to 255.

Grayscale Images: Pixels & Coordinates

Images are 2D matrices of numbers. The origin (0,0) is top-left.
Example: an 8x8 matrix where 0 is black and 1 is white.

# 8x8 matrix (0=black, 1=white)
img = np.array([
    [0,0,0,0,0,0,0,0],
    [0,1,1,0,0,1,1,0],
    [0,1,1,0,0,1,1,0],
    [0,0,0,0,0,0,0,0],
    [0,1,0,0,0,0,1,0],
    [0,0,1,1,1,1,0,0],
    [0,1,0,0,0,0,1,0],
    [0,0,0,0,0,0,0,0]
])

Color Images: The RGB Channels

Images are 3D matrices (height × width × 3 channels).
RGB: Red, Green, Blue. Values range from 0 (dark) to 255 (bright).

Essential Image Libraries

OpenCV: Very fast C++ library mapped to Python. Uses BGR channel order by default. Great for video streams and low-level transforms.
Pillow (PIL): Standard Python image library. Uses RGB. Easy to use for simple drawing and resizing.
NumPy: The core math library. Images in Python are fundamentally numpy multi-dimensional arrays (tensors).

Image Data Types: `uint8` vs Floats

Pixels are usually 8-bit unsigned integers (uint8), meaning values range from 0 to 255.
0 is Black. 255 is White (in grayscale) or full intensity (in RGB).
Deep Learning: Models like YOLO often prefer normalized inputs. We convert 0-255 to floats between 0.0 and 1.0 by dividing by 255.0.

Coordinates vs. Array Indexing
(The “Gotcha”)

We think in geometry: (x, y) = (width, height).
NumPy thinks in matrices: matrix[row, col] = matrix[y, x].
This causes endless confusion when cropping with bounding boxes (x1, y1, x2, y2).

# Assuming a bounding box (x1, y1, x2, y2)

# WRONG (Crops wrong area or crashes)
crop = img[x1:x2, y1:y2] 

# CORRECT (y = row, x = col)
crop = img[y1:y2, x1:x2]

The “Smurf Effect”: BGR vs. RGB

OpenCV historically uses Blue-Green-Red (BGR).
Most other libraries (Matplotlib, Pillow) expect Red-Green-Blue (RGB).
If you read an image with cv2.imread() and plot it directly in matplotlib, Red and Blue channels are swapped!
Faces look blue, skies look red.

Basic I/O Operations

import cv2
import matplotlib.pyplot as plt

# 1. Read image (OpenCV loads BGR)
img_bgr = cv2.imread('dog.jpg')

# 2. Convert to RGB for accurate display
img_rgb = cv2.cvtColor(img_bgr, cv2.COLOR_BGR2RGB)

# 3. Display
plt.imshow(img_rgb)
plt.axis('off')
plt.show()

Image Resolution and Resizing

Resolution: Dimensions of the image (Width × Height).
Deep Learning models typically require fixed input sizes (e.g., 640x640).
Aspect Ratio: The ratio of width to height.
Direct Resizing: Squashes or stretches the image, changing the aspect ratio (causes distortion).
Letterboxing: Resizes while maintaining aspect ratio, then pads the remaining area with a solid color.

Conclusion

Wrapping up Image Foundations

Summary

Light & Cameras: How digital sensors capture light and turn it into data.
Pixels & Channels: Grayscale (2D) vs. Color RGB (3D) representations.
Libraries: OpenCV, Pillow, and NumPy basics.
Gotchas: BGR vs RGB (The Smurf Effect) and Array Indexing vs Coordinates.

Next Steps

Next up: YOLO Inference

Using these images with state-of-the-art Computer Vision models.
Running inference directly out-of-the-box.

Q&A

Thank You!

Any questions?

Digital Image Foundations

1. Digital Image Foundations

How We See: Light and The Human Eye

How Computers “See”: The Digital Camera

The Digital Sensor: Capturing Light

Grayscale Images: Pixels & Coordinates

Color Images: The RGB Channels

Essential Image Libraries

Image Data Types: uint8 vs Floats

Coordinates vs. Array Indexing (The “Gotcha”)

The “Smurf Effect”: BGR vs. RGB

Basic I/O Operations

Image Resolution and Resizing

Conclusion

Summary

Next Steps

Q&A

Image Data Types: `uint8` vs Floats

Coordinates vs. Array Indexing
(The “Gotcha”)