how to infer text-img pair demo? #8

WinstonDeng · 2024-11-12T03:14:21Z

Using openai official text model, text embedding dim is 768, mismatching with llm2clip img embedding dim 1280.

text_model = CLIPModel.from_pretrained("openai/clip-vit-large-patch14-336")
inputs = tokenizer(text=texts, padding=True, return_tensors="pt").to(device)
text_features = text_model.get_text_features(**inputs) # [1, 768]

The text was updated successfully, but these errors were encountered:

Yif-Yang · 2024-11-12T03:29:54Z

We will update readme about this today. Will let you know after we add this. Thanks for you reminding.

BIGBALLON · 2024-11-12T12:01:35Z

Same question here. btw, do you have a plan to release CC Finetuned LLM?

Yif-Yang · 2024-11-12T14:52:55Z

Same question here. btw, do you have a plan to release CC Finetuned LLM?

We’ll do our best to release it within 24 hours. Thank you for the reminder. If you have any other requests, feel free to let us know. We’re happy to release whatever we can, as long as it complies with safety regulations.

eek · 2024-11-15T08:46:32Z

Any updates on this @Yif-Yang?

BIGBALLON · 2024-11-15T11:44:55Z

@eek
The CVPR deadline has just passed, and the authors have likely been rushing for the past few days.
let's wait a bit, the code will be open-sourced very soon.

Yif-Yang · 2024-11-15T11:46:35Z

really thanks for your understanding, we will look at it tonight.😁

Yif-Yang · 2024-11-15T11:51:48Z

@BIGBALLOBA You are really kind and thoughtful🌹🌹🌹

eek · 2024-11-15T13:12:44Z

Didn't want to come in rude, you guys @Yif-Yang did a fantastic job! I'm really curious and excited to play with it, so looking forward to it 😀 I'll wait patiently 😀 Congrats again!

Yif-Yang · 2024-11-18T07:27:42Z

@eek @BIGBALLON @WinstonDeng @mtodd

We have updated the caption contrastive fine-tuned version of Llama3-8B-CC (https://huggingface.co/microsoft/LLM2CLIP-Llama-3-8B-Instruct-CC-Finetuned) to assist with your retrieval experiments and training of your own CLIP models. Additionally, the parameters for our adapter and projector have been made available in our OpenAI ViT-L repository (https://huggingface.co/microsoft/LLM2CLIP-Openai-L-14-336). The retrieval testing methods are documented in the model card for reference.

Our tests show retrieval performance exceeding the results reported in the paper, and we encourage you to try it out.

Regarding the EVA series of models, there have been precision mismatches during the conversion to Hugging Face, which are currently being fixed. Updates will be released progressively.

Furthermore, we will provide detailed instructions on how to use LLM2CLIP to fine-tune your own CLIP models in about a week—please stay tuned!

konioy · 2024-11-21T01:14:26Z

expect the EVA series of models

JENNSHIUAN · 2024-11-21T09:47:20Z

How about the text encoder for EVA series?

Yif-Yang · 2024-11-21T18:35:34Z

will release soon, maybe today

ztwaaa · 2024-11-22T02:38:31Z

Thank you very much for your work! May I ask when the data preprocessing will be released?

Yif-Yang · 2024-11-22T05:13:29Z

Thank you very much for your work! May I ask when the data preprocessing will be released?

Should be around today or tomorrow I think.

Yif-Yang · 2024-11-22T13:04:18Z

@konioy @JENNSHIUAN We just updated EVA02's pytorch ckpt. We will try update safe tensor version tomorrow.

vilhub · 2024-11-22T14:02:45Z

Great work, thanks a lot @Yif-Yang ! For the EVA02 model how does one encode the text? It doesn't seem to have the same get_text_features method like for the OpenAI-CLIP based models.

Yif-Yang · 2024-11-22T14:11:56Z

Great work, thanks a lot @Yif-Yang ! For the EVA02 model how does one encode the text? It doesn't seem to have the same get_text_features method like for the OpenAI-CLIP based models.

will upload asap

raytrun · 2024-11-22T15:36:41Z

@vilhub We have updated the README in HuggingFace, which now includes usage examples.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

how to infer text-img pair demo? #8

how to infer text-img pair demo? #8

WinstonDeng commented Nov 12, 2024

Yif-Yang commented Nov 12, 2024

BIGBALLON commented Nov 12, 2024

Yif-Yang commented Nov 12, 2024

eek commented Nov 15, 2024

BIGBALLON commented Nov 15, 2024

Yif-Yang commented Nov 15, 2024

Yif-Yang commented Nov 15, 2024

eek commented Nov 15, 2024

Yif-Yang commented Nov 18, 2024

konioy commented Nov 21, 2024

JENNSHIUAN commented Nov 21, 2024

Yif-Yang commented Nov 21, 2024

ztwaaa commented Nov 22, 2024

Yif-Yang commented Nov 22, 2024

Yif-Yang commented Nov 22, 2024

vilhub commented Nov 22, 2024

Yif-Yang commented Nov 22, 2024

raytrun commented Nov 22, 2024

how to infer text-img pair demo? #8

how to infer text-img pair demo? #8

Comments

WinstonDeng commented Nov 12, 2024

Yif-Yang commented Nov 12, 2024

BIGBALLON commented Nov 12, 2024

Yif-Yang commented Nov 12, 2024

eek commented Nov 15, 2024

BIGBALLON commented Nov 15, 2024

Yif-Yang commented Nov 15, 2024

Yif-Yang commented Nov 15, 2024

eek commented Nov 15, 2024

Yif-Yang commented Nov 18, 2024

konioy commented Nov 21, 2024

JENNSHIUAN commented Nov 21, 2024

Yif-Yang commented Nov 21, 2024

ztwaaa commented Nov 22, 2024

Yif-Yang commented Nov 22, 2024

Yif-Yang commented Nov 22, 2024

vilhub commented Nov 22, 2024

Yif-Yang commented Nov 22, 2024

raytrun commented Nov 22, 2024