A custom implementation of YOLOv8 for detecting birds in images and videos. This project provides a complete pipeline from dataset preparation to model training and inference.
- Features
- Prerequisites
- Installation
- Usage
- Dataset Preparation
- Model Training
- Inference
- Best Practices
- GPU Requirements
- Custom YOLOv8 model training for bird detection
- Support for both image and video inference
- Real-time object tracking in videos
- Configurable confidence thresholds
- Automatic annotation visualization
- Support for both local files and URLs
- Save capabilities for annotated images and videos
- Python 3.8+
- CUDA-compatible GPU (recommended)
- Google Colab (optional, for free GPU access)
- Clone this repository:
git clone https://github.com/Pushtogithub23/birds-detection-yolov8.git
cd birds-detection-yolov8
- Install required packages:
pip install -r requirements.txt
# For local image
display_prediction("path/to/your/image.jpg", save_fig=True, filename="detected.jpg")
# For image URL
display_prediction("https://example.com/image.jpg", save_fig=True, filename="detected.jpg")
I have attached below few detections in images:
predict_in_videos(
"path/to/your/video.mp4",
save_video=True,
filename="detected_video.mp4"
)
I have attached a video detection in GIF format below.
- Create a Roboflow account and obtain your API key
- Update the API key in the notebook:
rf = Roboflow(api_key="YOUR_API_KEY")
- The dataset structure should be:
BIRDS-DETECTION-1/
βββ train/
βββ valid/
βββ test/
βββ data.yaml
The model is trained using YOLOv8-large (yolov8l.pt) as the base model:
model = YOLO('yolov8l.pt')
model.train(
data="BIRDS-DETECTION-1/data.yaml",
epochs=100,
imgsz=640
)
Training parameters can be adjusted based on your needs:
epochs
: Number of training epochsimgsz
: Input image sizebatch
: Batch size (adjust based on GPU memory)
- Automatic thickness calculation based on image resolution
- Dynamic text scaling
- Confidence threshold filtering (default: 0.5)
- Color-coded annotations using 'magma' colormap
- Support for both image and video processing
- Object tracking in videos using ByteTrack
- Real-time object tracking
- FPS-synchronized processing
- Progress visualization
- Press 'p' to stop processing
- Ensure the dataset is well-balanced
- Include images with varying lighting conditions
- Use data augmentation for better generalization
- Monitor training metrics in runs/detect/train directory
- Adjust confidence threshold based on application needs
- Consider model ensembling for better results
- Move API keys to environment variables
- Implement proper error handling
- Consider model quantization for faster inference
This project was developed and tested on Google Colab with T4 GPU acceleration. When running on different hardware:
- Adjust batch size according to available GPU memory
- Modify image size if needed
- Consider using model quantization for faster inference on less powerful hardware