The goal of the project is to develop a system using deep learning techniques to assist visually impaired individuals in obtaining information by describing images taken by them. The system uses a CNN model and an NLP model to create a single image captioning system that takes image features as input and generates a text sequence describing the image.
Incorporated state-of-the-art pre-trained models, such as ResNet50, VGG16, and VGG19, for image feature extraction and LSTM and Bidirectional LSTM for text generation. Evaluated various models to determine the best-performing model with a BLEU-score of 0.61 and deployed it using Flask and pyttsx3 for web and text-to-speech functionality in the app.
These instructions will get you a copy of the project up and running on your local machine.
- Clone the project repository from GitHub:
git clone https://github.com/ammarlodhi255/image-captioning-system-to-assist-the-blind.git
- Navigate to the project directory:
cd image-captioning-system-to-assist-the-blind
- Create a virtual environment for the project:
python3 -m venv env
- Activate the virtual environment:
source env/bin/activate
- Export the Flask app:
export FLASK_APP=app.py
- Run the Flask app:
flask run
Contributions are welcome! Please feel free to submit a Pull Request.
- Fork the repository
- Create your feature branch (
git checkout -b feature/AmazingFeature
) - Commit your changes (
git commit -m 'Add some AmazingFeature'
) - Push to the branch (
git push origin feature/AmazingFeature
) - Open a Pull Request