A webapp which can generate brief captions from images. We have used a merge model similar to "Show and tell architecture" to generate brief captions. We trained model on flickr8k dataset with the help of google colab.
We have used a python based API framework named FastAPI and for frontend we used Streamlit framework.
Below is the demo output of our model
Basic requirements
- Python3
- Docker
Steps to run in your local system:
git clone https://github.com/jaykshirsagar05/captionify.git
cd Captionify
Docker-compose build
Docker-compose up
visit to http://172.19.0.3:8501/ for streamlit app.
visit to for http://127.0.0.1:8000/docs server side(fastapi)
NOTE: You need to change the path of pre-trained model in file.
This project is open sourced.
Anyone is welcomed to contribute to this project.
-
Brownlee, J. (2019, June 27). How to Develop a Deep Learning Photo Caption Generator from Scratch. Retrieved September 26, 2020, from https://machinelearningmastery.com/develop-a-deep-learning-caption-generation-model-in-python/
-
Davidefiocco. (n.d.). Davidefiocco/streamlit-fastapi-model-serving. Retrieved September 26, 2020, from https://github.com/davidefiocco/streamlit-fastapi-model-serving
-
Xu, K., Ba, J. L., Kiros, R., Cho, K., Courville, A., Salakhutdinov, R., . . . Bengio, Y. (2016, April 19). Show, Attend and Tell: Neural Image Caption Generation with Visual Attention [Scholarly project]. In ArXiv. Retrieved September 26, 2020, from https://arxiv.org/pdf/1502.03044.pdf
Our project is mostly completed, but the prediction of model is not accurate. Further improvement is welcomed!