This project leverages the YOLOv9 model for detecting and segmenting potholes in images and videos. The model is trained on a custom dataset using advanced techniques like mixed precision training, gradient accumulation, and optimized hyperparameters. This README outlines the project's setup, features, and key results.
- 🌟 Introduction
- ✨ Features
- ⚙️ Installation
- 📂 Dataset
- 🛠️ Training
- 📊 Results
- 🎥 Inference on Video
- 📝 License
The goal of this project is to accurately detect and segment potholes from images and videos using a custom-trained YOLOv9 model. The model is fine-tuned on a pothole segmentation dataset, achieving robust performance metrics. This README covers how to set up the environment, train the model, and perform inference on images and videos.
- YOLOv9c-seg Model: Utilizes a custom-trained YOLOv9 model specifically for pothole detection and segmentation.
- Data Augmentation: Includes random horizontal flips, resizing, and color jitter to improve model generalization.
- Optimized Training: Hyperparameters like learning rate, batch size, and dropout rate are optimized for better performance.
- Mixed Precision Training: Speeds up training and reduces memory consumption.
- Inference on Video: The model is capable of making real-time inferences on video streams.
Ensure you have the following installed:
- Python 3.7+
- PyTorch
- YOLOv9 (ultralytics package)
- OpenCV
- Other dependencies:
numpy
,matplotlib
,seaborn
,Pillow
,ffmpeg
-
Clone the repository:
git clone <repository-url>
cd <repository-directory>
-
Install the required Python packages:
pip install ultralytics numpy matplotlib seaborn opencv-python pillow
-
Download the dataset:
- Use the Kaggle CLI or download manually from the Kaggle website.
The dataset used for training and validation contains images of roads with labeled potholes. The images are divided into train
, valid
, and test
directories:
- Train Directory: Contains the images used for training.
- Validation Directory: Used to validate the model's performance during training.
- Test Directory: Used to evaluate the final model's performance.
You can download the dataset from Kaggle.
The model is trained using the following setup:
- Model: YOLOv9 with segmentation head (
yolov9c-seg.pt
) - Epochs: 30
- Image Size: 512x512 pixels
- Batch Size: 30
- Optimizer: AdamW
- Learning Rate: Initial
lr0=0.00001
, Finallrf=0.01
- Dropout Rate: 0.15
- Device: CUDA (GPU)
- Data Augmentation: Techniques such as random flipping, resizing, and color jitter are applied to the images to enhance the model's ability to generalize.
- Model Training: The model is trained over multiple epochs, and its performance is evaluated on a validation set after each epoch.
- Checkpointing: The best model weights based on validation performance are saved during training.
- Hyperparameter Tuning: Key parameters such as the learning rate, batch size, and dropout rate are optimized for best performance.
After training, the model achieved the following performance metrics:
- Precision (P): 0.757
- Recall (R): 0.726
- mAP@50: 0.807
- mAP@50-95: 0.519
- Mask Precision (P): 0.763
- Mask Recall (R): 0.746
- Mask mAP@50: 0.825
- Mask mAP@50-95: 0.498
These metrics indicate strong performance in detecting and segmenting potholes across various test images.
The model can perform real-time inference on video streams to detect and segment potholes:
- Video Processing: The trained model is used to process each frame of the video and apply the segmentation mask.
- Video Output: The processed video is saved in a specified format and can be viewed to assess the model's performance in real-world scenarios.
To convert the video to a suitable format (e.g., MP4) and view the results, use tools like FFmpeg.
This project is licensed under the MIT License. See the LICENSE file for more details.