
Custom Data & Training
2026-05-14
From a single feature to a full classifier — built up step by step.
In Part 1: a model is f, learned from data, mapping image → output.
That’s what it does — not how it learns.
This section: how f goes from random guesses to useful predictions.
We’ll build it up from a toy color classifier, then scale to YOLO.
Training isn’t magic; it’s a repetitive cycle of refinement.
We start with a guess and “tune” the knobs until we’re happy with the results.
Task: Is this a red car? — output should eventually be a probability.
Simplest feature: average red pixel value (0-255).
\[f(\text{image}) \approx x_{\text{red}}(\text{image})\]

The average red value lands in [0, 255] — but it’s just a raw number. To learn, we need parameters.
Add weights to calculate a score (\(z\)):
\[z = w \cdot x_{\text{red}} + b\]
Now \(z\) depends on \(w\) and \(b\) — but it’s unbounded: it can go negative or exceed 1.
Problem: We need a function that maps any real number → [0, 1] for a probability.
\[\sigma(z) = \frac{1}{1 + e^{-z}}\]

Prediction: \[f(\text{image}) = \sigma(\underbrace{w \cdot x_{\text{red}} + b}_{\text{score } z})\]
We have predictions. From Part 3, we know how to grade them — mAP, Precision, Recall. So why not just train against those?
Discrete (Problem)
mAP, Precision, Recall
Small weight changes → zero change in the metric.
Smooth (Solution)
Training Loss
Every tiny adjustment is mathematically measurable.

So we need a smooth substitute. Enter the loss.
We skip deeper technical details to keep the intuition front and center.
To evaluate our model, we need a mathematical definition of “wrongness.”
🤔 Instinct
Measure the raw difference between target and prediction.
\(\text{err} = y - f(x)\)
🚨 The Flaw
Target = \(0\), Prediction = \(1\)
\(0 - 1 = \mathbf{-1}\)
The model seeks the lowest number. It thinks \(-1\) is better than \(0\) (perfect)!
💡 The Fix
A mistake of \(-1\) is just as bad as \(+1\).
Square the difference:
\(\text{err}^2 = (y - f(x))^2\)
Let’s start with a random guess: w = 0.06, b = -14.0
Image A — Red Car
\(x_{\text{red}} \approx 247\)
Image B — Not a Red Car
\(x_{\text{red}} \approx 212\)

Score: \(z = 0.06(247) - 14.0 = 0.82\)
Prediction: \(f(x) = \sigma(0.82) \approx 0.69\)
\(\text{err} = (1 - 0.69)^2 = \mathbf{0.10}\)
Score: \(z = 0.06(212) - 14.0 = -1.28\)
Prediction: \(f(x) = \sigma(-1.28) \approx 0.22\)
\(\text{err} = (0 - 0.22)^2 = \mathbf{0.05}\)
To get a single metric for the whole dataset, we average these squared errors.
For our two images: Loss = \(\frac{0.10 + 0.05}{2}\) = 0.075
This is called Mean Squared Error (MSE): \[\mathcal{L}_{\text{MSE}} = \frac{1}{n}\sum_{i=1}^{n}\bigl(y_i - f(x_i)\bigr)^2\]
Note: We use MSE here for simplicity to build intuition. In practice, modern classifiers use Cross-Entropy Loss, which mathematically penalizes highly-confident wrong guesses much more harshly.
Two knobs (w, b). The gradient points where loss drops fastest. Step a little that way. Repeat. → gradient descent.
Our toy: 2 weights, 1 feature (avg red).
YOLO — same idea, scaled:
One loss isn’t enough → YOLO sums several, one per question.
Total loss = sum of three components, each catching a different mistake:
| Component | Also called | What it penalizes |
|---|---|---|
| Box Loss | box_loss |
Wrong coordinates |
| Class Loss | cls_loss |
Wrong label |
\[\mathcal{L}_{\text{total}} = \lambda_{\text{box}}\mathcal{L}_{\text{box}} + \lambda_{\text{cls}}\mathcal{L}_{\text{cls}}\]
Formatting your data and preparing labels.
To train a robust AI, split your data:
mAP).Crucial: Never train on validation data! This causes Data Leakage where the model memorizes answers but fails in the real world.
Split your data automatically using Ultralytics Data Split or using the Platform.
The Problem: Validation/test data accidentally ends up in the training set.
The Danger: Val metrics look “perfect,” but the model fails in production.
How to Avoid: - Use autosplit() for automatic, safe partitioning. - Avoid manually copying images between split folders. - Check for duplicates/hashes before training.
Suspiciously perfect metrics? Check for leakage!
Different formats for different tasks:
train/cat/, train/dog/)..txt label files.Object Detection (.txt Bounding Box): <class_index> <x_center> <y_center> <width> <height>
Instance Segmentation (.txt Polygon): <class_index> <x1> <y1> <x2> <y2> ... <xn> <yn>
⚠ Note: Coordinates must always be normalized (0.0 to 1.0) to keep annotations independent of image resolution!
data.yaml FileLinking indices to names and defining paths.
Typing coordinates manually is impossible. Use specialized tools to draw boxes or polygons visually:
Docs Reference: Ultralytics HUB
Speed up the process using foundational models to do the heavy lifting:
Docs Reference: Segment Anything Models (SAM)
Hyperparameters and Configs.
We can start training directly from the command line or Python API:
If your training is interrupted (e.g., power outage or timeout), you can easily resume it from the last saved weights without losing progress!
.pt (Pretrained weights): Starts with general features (edges, shapes) from large datasets. Fast training, needs little data. (Always recommended!).yaml (Architecture Definition): Blank blueprint. Random weights, meaning you train entirely from scratch.Use a .pt model for all custom dataset training to leverage Transfer Learning.
epochs (Default: 100): Pass through the ENTIRE dataset. More = learns more, but risks overfitting.
batch (Default: 16): Images processed at once.
batch: -1 for AutoBatch to maximize your GPU memory!imgsz (Default: 640): Target image size. Larger sizes (e.g., 1024) capture more detail but use more RAM.
device (Default: ’’): Specify GPUs (e.g., device=0,1) to trigger multi-GPU training!
Learning Rate (lr0) (Default: 0.01): Controls step size—too large and the ball overshoots, too small and training is slow.
Weight Decay (Default: 0.0005): Penalizes large weights to prevent memorization (🔴 Overfitting).
Patience (patience) (Default: 100): Stop training if validation metrics haven’t improved for \(N\) epochs.
For reproducibility, put arguments into an experiment.yaml file:
Run with:
Ultralytics seamlessly integrates with MLOps tools to log metrics (mAP, loss) and visuals (confusion matrix) during training.
Supported Integrations: - TensorBoard (Built-in) - Weights & Biases (W&B) - Comet / ClearML / MLflow
Docs Reference: Integrations
Reading the curves to fix what training isn’t learning.
Every training run lands somewhere on this spectrum — knowing where is the first step to fixing it.
🔴 Overfitting
Model memorizes training data.
Too much capacity for your data
🟢 Just Right
Model generalizes to new data.
The goal — stay here
🟠 Underfitting
Model never learned the patterns.
Too little capacity or training
The model memorizes training data instead of learning general patterns.
What you see:

The model hasn’t learned the patterns — too simple, over-constrained, or undertrained.
What you see:

In a parking lot you’ll naturally collect 9,500 empty frames for every 500 cars — and the model can hit 95% accuracy by always predicting “empty.”

Fix: oversample the minority class so the model sees Cars often enough to learn them.
Technical Guide: Class Balancing with YOLO
The training-side fix for 🔴 overfitting — show the model more variety so it can’t memorize.
import albumentations as A
from ultralytics import YOLO
model = YOLO("yolo26n.pt")
# Define custom transforms
custom_transforms = [
A.Blur(blur_limit=7, p=0.5),
A.HorizontalFlip(p=0.5),
A.RandomBrightnessContrast(p=0.5),
]
# Train the model with custom transforms
model.train(data="data.yaml", epochs=100, augmentations=custom_transforms)| What you see | Problem | Actions |
|---|---|---|
| Train mAP >> Val mAP, Val loss rising | 🔴 Overfitting | Add augmentation · increase weight_decay · set patience |
| Both mAP low and flat, loss barely moves | 🟠 Underfitting | Bigger model · more epochs · raise lr0 · lower weight_decay |
| One class dominates predictions | ⚖️ Imbalance | Oversample minority · class weights |
| Val metrics suspiciously perfect | 🕳️ Leakage | Re-split with autosplit() · de-duplicate |
| Both losses converging, mAP climbing | 🟢 Healthy | Continue training or deploy |
data.yaml, manual labeling, and auto-labeling with SAM.lr0, batch, imgsz) and tracking experiments.Next up: Part 5 — Deployment
Thank You!
Any questions?