This repository contains the streamlit demo for the Episode 1 of Vision Language Modelling Series by "Donkey Stereotype by PrithiviDa".
Youtube: Video Link
Original Reference: Training Notebook
Dataset: Training and Testing Dataset
Demo: Host Link
test_samples directory contains some images to interact with demo. Their corresponding questions are in questions.txt. For anyone who has no idea what this is all about, just pick up the images and questions from the directory and play around.
Note:The model demonstrated here is EarlyFusion one from the video.