Custom Data & Training
2026-03-13
Teaching the model your custom data.
Before we dive in, let’s lock down the terminology:
yolo26n.pt).To train a robust AI, split your data:
mAP).Crucial: Never train on validation data! This causes a Data Leak (model memorizes answers but fails in the real world).
Split your data into train and validation sets using Ultralytics Data Split or using thier Platform
.pt (Pretrained weights): Starts with general features (edges, shapes) from large datasets. Fast training, needs little data. (Always recommended!).yaml (Architecture Definition): Blank blueprint. Random weights, meaning you train entirely from scratch.Use a .pt model for all custom dataset training to leverage Transfer Learning.
Different formats for different tasks:
train/cat/, train/dog/). No label files needed!.txt label files.Object Detection (.txt Bounding Box): <class_index> <x_center> <y_center> <width> <height>
Instance Segmentation (.txt Polygon): <class_index> <x1> <y1> <x2> <y2> ... <xn> <yn>
Pose Estimation (.txt Keypoints): <class_index> <x_center> <y_center> <width> <height> <px1> <py1> <p1_vis> ...
⚠ Note: Coordinates must always be normalized (0.0 to 1.0) to keep annotations independent of image resolution!
data.yaml FileLinking indices to names.
The Ultralytics Platform
Typing coordinates manually is impossible. Use Ultralytics HUB, Roboflow, or CVAT.
Docs Reference: Ultralytics HUB
Speed up the labeling process using foundational models:
Docs Reference: Segment Anything Models (SAM)
Hyperparameters and Configs
We can start training directly from the command line or Python API:
If your training is interrupted (e.g., power outage or Colab timeout), you can easily resume it from the last saved weights without losing progress!
epochs (Default: 100): Pass through the ENTIRE dataset. More = learns more, but risks overfitting (memorizing data).patience (Default: 50): If metrics don’t improve for this many epochs, training stops early to save time!batch (Default: 16): Images processed at once.
batch: -1 for AutoBatch to maximize your GPU memory without crashing!optimizer (Default: ‘auto’): Algorithm guiding learning (SGD, AdamW, etc.). auto works 99% of the time.lr0 (Default: 0.01): Initial Learning Rate (step size).
imgsz (Default: 640): Target image size. Larger sizes (e.g., 1024) capture more detail (great for tiny objects!) but use more RAM and are slower.
device (Default: ’’): Specify GPUs (e.g., device=0,1) to trigger multi-GPU training!For reproducibility, put arguments into a experiment.yaml file:
Run with:
Ultralytics seamlessly integrates with popular MLOps tools to log metrics (mAP, loss) and matrices (confusion matrix) during training.
Supported Integrations:
Docs Reference: Integrations
YOLO natively supports Albumentations to improve model robustness, applying default augmentations automatically if installed.
import albumentations as A
from ultralytics import YOLO
model = YOLO("yolo26n.pt")
# Define custom Albumentations transforms
custom_transforms = [
A.Blur(blur_limit=7, p=0.5),
A.CLAHE(clip_limit=4.0, p=0.5),
A.RandomBrightnessContrast(brightness_limit=0.2, contrast_limit=0.2, p=0.5),
]
# Train the model with custom transforms
model.train(data="data.yaml", epochs=100, augmentations=custom_transforms)Docs Reference: Albumentations Integration
Wrapping Up Custom Training
Next up: Evaluation & Deployment
Thank You!
Any questions?