A Streamlit web application that performs object detection using YOLOv4 and generates image captions using BLIP transformer model.
- Upload images for object detection
- Real-time object detection using YOLOv4
- Image captioning using BLIP (Salesforce)
- Clean and intuitive user interface
- Object detection confidence scores
- Automatic image resizing for optimal processing
- Clone the repository:
git clone <repository-url>
cd DetectiNator
- Install the required dependencies:
pip install -r requirements.txt
Required packages:
- streamlit
- transformers
- PIL
- torch
- cvlib
- opencv-python (cv2)
- numpy
- Run the Streamlit app:
streamlit run main.py
- Upload an image using the file uploader
- Click "Detect" to perform object detection
- View the detected objects with bounding boxes
- Read the automatically generated caption describing the scene
- Image Upload: Users can upload images in PNG or JPG format
- Object Detection: Uses YOLOv4 model through cvlib to detect common objects
- Visualization: Displays detected objects with bounding boxes
- Captioning: Generates descriptive captions using BLIP transformer model
- Display: Shows both the annotated image and generated caption
- Object Detection: YOLOv4 (via cvlib)
- Image Captioning: BLIP (Salesforce/blip-image-captioning-base)
- Frontend: Streamlit
- Image Processing: OpenCV
- Deep Learning: PyTorch
- Python 3.7+
- Adequate RAM for model inference
- GPU recommended for faster processing
- Internet connection for model downloads
- Supports only static image processing
- Limited to common object detection
- Requires stable internet for first-time model downloads