How to train YOLOv8 object detection on a custom dataset

Key takeaways:
YOLOv8 model is optimized for real-time object detection, balancing speed and accuracy.
Images must be annotated with class labels and bounding box coordinates.
Validate model performance using a separate dataset and best-performing weights.
Generate predictions on new images with bounding boxes and class labels.
YOLOv8 improves on previous versions, suitable for real-time object detection applications.

YOLOv8 is the newest version in the YOLO series of models and comes with important improvements in spotting objects. Similar to its earlier versions, YOLOv8 processes an input image once to predict all objects in the image, treating object detection as a regression problem. These models are distinctive because they redefine the task creatively and are pretrained on large datasets, such as COCO and ImageNet. This enables them to excel in accurately recognizing pretrained classes and efficiently learning new classes. YOLO models, including YOLOv8, find a balance between speed and accuracy, making them suitable for real-time applications and easily trainable on single GPUs.

In this Answer, we see how to train YOLOv8 on a custom dataset, for object detection. Object detection, categorized under computer vision tasks, involves identifying objects within an image or video sequence.

Dataset

Each image must be annotated with the respective class name and the bounding box coordinates, which represent the rectangular area—bounding boxes—around the object.

Let’s consider an object detection task for identifying trees in images. Each image containing a tree would have a bounding box annotation—outlining the area where the tree is located—which will be defined by its coordinates. The class label for this task would be “tree” associated with each annotated bounding box.

In this Answer, we’ll use the public dataset "Thermal Dogs and People Object Detection Dataset.” It comprises 203 thermal infrared images captured at diverse distances from individuals and dogs in parks and residential settings. This dataset is divided into smaller training, validation, and testing datasets.

Explanation

Line 1: Specifies that the YOLO framework should perform the detection task.
Line 2: Indicates that the model is in training mode.
Line 3: Specifies the initial model or weights to be used for training. In this case, it starts with the YOLOv8 small (yolov8s) pretrained model.
Line 4: Specifies the path to the YAML file containing configuration details for the dataset, including information about the data, classes, and other relevant settings.
Line 5: Sets the number of training epochs. An epoch is a complete pass through the entire training dataset.
Line 6: Defines the input image size for training. In this case, the images are resized to a square of 416 pixels on each side.

Evaluation

Evaluating the model’s performance post-training is essential. This involves utilizing a distinct dataset known as validation data. The following command is used for evaluating or validating the performance of a trained model.

Explanation

Line 1: Specifies that the YOLO framework should perform the object detection task.
Line 2: Indicates that the model is in validation mode. During validation, the model’s performance is assessed using a separate dataset not seen during training.
Line 3: Specifies the path to the trained weights file. In this case, it’s using the weights from the best-performing model during training (best.pt).
Line 4: Specifies the path to the YAML file containing configuration details for the dataset. This file includes information about the validation dataset, classes, and other relevant settings.

Inference

Following training and validation, the YOLOv8 model is ready for real-world object recognition. When presented with new, unseen photos, it generates bounding boxes around identified objects along with their corresponding class predictions.

Explanation

Line 1: Specifies that the YOLO framework should perform the object detection task.
Line 2: Indicates that the model is in prediction mode. This means it’s used for making predictions on new, unseen data.
Line 3: Specifies the path to the trained weights file. In this case, it’s using the weights from the best-performing model during training (best.pt).
Line 4: Sets the confidence threshold for object detection. Detected objects with confidence scores below this threshold will be ignored.
Line 5: Specifies the path to the image file on which the predictions will be made.

Conclusion

YOLOv8 is a significant improvement in the YOLO series, showing better abilities in detecting objects. It builds on the earlier versions and changes how object detection is done, achieving high accuracy and speed, which makes the model useful in real-time applications.

Frequently asked questions

Haven’t found what you were looking for? Contact Us

What is YOLOv8 trained on?

YOLOv8 is typically trained on large object detection datasets like COCO, but it can be fine-tuned on custom datasets to detect specific objects.

How to train YOLOv8 faster?

To speed up YOLOv8 training, consider using a GPU, reduce the batch size, or use mixed-precision training. Additionally, decreasing image resolution or training for fewer epochs can also accelerate training.

How many classes can YOLOv8 detect?

YOLOv8 can detect as many classes as specified in the training dataset, with the COCO dataset supporting 80 classes. For custom training, the class count depends on the dataset used.

Free AI Mock Interviews

Coding Interview

Coding PatternsFree Interview

Gain insights and practical experience with coding patterns through targeted MCQs and coding problems, designed to match and challenge your expertise level.

System Design

YouTubeFree Interview

Learn to design a video streaming platform like YouTube by tackling functional and non-functional requirements, core components, and high-level to detailed design challenges.

Free Resources