Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

how to infer text-img pair demo? #8

Open
WinstonDeng opened this issue Nov 12, 2024 · 18 comments
Open

how to infer text-img pair demo? #8

WinstonDeng opened this issue Nov 12, 2024 · 18 comments

Comments

@WinstonDeng
Copy link

Using openai official text model, text embedding dim is 768, mismatching with llm2clip img embedding dim 1280.

text_model = CLIPModel.from_pretrained("openai/clip-vit-large-patch14-336")
inputs = tokenizer(text=texts, padding=True, return_tensors="pt").to(device)
text_features = text_model.get_text_features(**inputs) # [1, 768]
@Yif-Yang
Copy link
Collaborator

We will update readme about this today. Will let you know after we add this. Thanks for you reminding.

@BIGBALLON
Copy link

Same question here. btw, do you have a plan to release CC Finetuned LLM?

@Yif-Yang
Copy link
Collaborator

Same question here. btw, do you have a plan to release CC Finetuned LLM?

We’ll do our best to release it within 24 hours. Thank you for the reminder. If you have any other requests, feel free to let us know. We’re happy to release whatever we can, as long as it complies with safety regulations.

@eek
Copy link

eek commented Nov 15, 2024

Any updates on this @Yif-Yang?

@BIGBALLON
Copy link

@eek
The CVPR deadline has just passed, and the authors have likely been rushing for the past few days.
let's wait a bit, the code will be open-sourced very soon.

@Yif-Yang
Copy link
Collaborator

really thanks for your understanding, we will look at it tonight.😁

@Yif-Yang
Copy link
Collaborator

@BIGBALLOBA You are really kind and thoughtful🌹🌹🌹

@eek
Copy link

eek commented Nov 15, 2024

Didn't want to come in rude, you guys @Yif-Yang did a fantastic job! I'm really curious and excited to play with it, so looking forward to it 😀 I'll wait patiently 😀 Congrats again!

@Yif-Yang
Copy link
Collaborator

@eek @BIGBALLON @WinstonDeng @mtodd

We have updated the caption contrastive fine-tuned version of Llama3-8B-CC (https://huggingface.co/microsoft/LLM2CLIP-Llama-3-8B-Instruct-CC-Finetuned) to assist with your retrieval experiments and training of your own CLIP models. Additionally, the parameters for our adapter and projector have been made available in our OpenAI ViT-L repository (https://huggingface.co/microsoft/LLM2CLIP-Openai-L-14-336). The retrieval testing methods are documented in the model card for reference.

Our tests show retrieval performance exceeding the results reported in the paper, and we encourage you to try it out.

Regarding the EVA series of models, there have been precision mismatches during the conversion to Hugging Face, which are currently being fixed. Updates will be released progressively.

Furthermore, we will provide detailed instructions on how to use LLM2CLIP to fine-tune your own CLIP models in about a week—please stay tuned!

@konioy
Copy link

konioy commented Nov 21, 2024

expect the EVA series of models

@JENNSHIUAN
Copy link

How about the text encoder for EVA series?

@Yif-Yang
Copy link
Collaborator

will release soon, maybe today

@ztwaaa
Copy link

ztwaaa commented Nov 22, 2024

Thank you very much for your work! May I ask when the data preprocessing will be released?

@Yif-Yang
Copy link
Collaborator

Thank you very much for your work! May I ask when the data preprocessing will be released?

Should be around today or tomorrow I think.

@Yif-Yang
Copy link
Collaborator

@konioy @JENNSHIUAN We just updated EVA02's pytorch ckpt. We will try update safe tensor version tomorrow.

@vilhub
Copy link

vilhub commented Nov 22, 2024

Great work, thanks a lot @Yif-Yang ! For the EVA02 model how does one encode the text? It doesn't seem to have the same get_text_features method like for the OpenAI-CLIP based models.

@Yif-Yang
Copy link
Collaborator

Great work, thanks a lot @Yif-Yang ! For the EVA02 model how does one encode the text? It doesn't seem to have the same get_text_features method like for the OpenAI-CLIP based models.

will upload asap

@raytrun
Copy link
Collaborator

raytrun commented Nov 22, 2024

@vilhub We have updated the README in HuggingFace, which now includes usage examples.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

9 participants