This project is a Multi-Modal Search Engine developed using CLIP by OpenAI, with Flask API for backend and HTML/CSS for the frontend web application.
This project provides a seamless web interface where users can input text queries, and the system retrieves relevant images based on the textual description based on CLIP architecture read the paper.
- This video demonstrates how to use our project's main feature.
- Sample data of 130 images is present in the file or
- See the video or
- Place your images in
src/minidata
- Run the notebook
src/image-processor
- Move the data in
src/image_embeddings
& the data insrc/minidata
toflaskapp/image_embeddings
&flaskapp/static
respectively (caution: transfer the data, not the directories)
- Multi-Modal Search: Users can input textual descriptions of images to retrieve relevant images.
- Intuitive Web Interface: The frontend is built using React to provide a user-friendly experience.
- Scalable Backend: Flask API serves as the backend, handling requests and interacting with the CLIP model.
Clone the repository:
git clone https://github.com/ahmedembeddedxx/multimodal-search-engine.git
Start the backend server:
cd flaskapp/
flask run
Access the web application in your browser at http://127.0.0.1:5000/
.
- Shift the app to ReactJs
- Use ImageBind by MetaAI
- More accurate modal evaluation
- Integrate Audio & Video Functionality