Vi-ATISO is an efficient and versatile system built as a collection of microservices to support various search functions related to video content. It offers functionalities such as Image Retrieval, Video Retrieval, Object Search, and OCR Search. Whether you're looking for specific visual elements or events within videos, extracting text information, or searching for objects with counts, our system has you covered.
Keywords: Lifelog, Video Event Retrieval, Interactive Retrieval System
- Text-to-Image Retrieval: Leverage the power of CLIP and BEiT-3 models to perform text-based image retrieval, allowing users to find frames based on their description.
- Text-to-Video Retrieval: Utitlize the CLIP2Video model for efficient text-to-video retrieval, enabling the search based on visual elements and temporal information.
- Object Search: Utilize the VFNet model as an object detector, a video can be searched by the objects detected in its keyframes. Users can search for any combinations of objects classes with any or specific number of occurences.
- OCR Search: Leverage the Vietnamese-OCR-Toolbox for Keyword-based Image Search.
Documents for APIs development and deployment:
- CLIP Image Retrieval
- BEiT-3 Image Retrieval
- CLIP2Video Video Retrieval
- Object Search
- OCR Search
This project is licensed under the MIT License.
As the project is done by multiple services, please follow the guide corresponding to the service you want to contribute to.