anuragithub / vl_models_simplified Public

Notifications You must be signed in to change notification settings
Fork 0
Star 2

Simple implementation of Vision Language models in torch

2 stars 0 forks Branches Tags Activity

Notifications

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
notebooks		notebooks
src		src
.gitignore		.gitignore
README.md		README.md
setup.py		setup.py

Repository files navigation

Vision Langauge Models simplified (torch implementations)

Simple implementation of Vision Language models in torch

This repo aims at demonstrating implementation of prominent models in torch in vision-langauge models.

Current implementations:
- CLIP model :
  - Original paper: https://arxiv.org/abs/2103.00020
  - Official implementation: https://github.com/openai/CLIP
  - Source motivated and borrowed from : https://github.com/moein-shariatnia/OpenAI-CLIP
  - Training model is available in form of modules and scripts.
  - Zero-shot classification inference mode is implemented in form of notebook.
- LIT model :
  - Original paper: https://arxiv.org/abs/2111.07991
  - Official implementation: https://github.com/google-research/vision_transformer/blob/main/model_cards/lit.md
  - In nutshell, LiT differs from the original CLIP, in a sense that the LiT freezes the image tower and trains and tunes the text model.
- CoCo model :
  - Original paper: https://arxiv.org/pdf/2205.01917v2.pdf
  - Upcoming.

About

Simple implementation of Vision Language models in torch

Report repository

Releases

No releases published

Packages

No packages published

Languages