Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

how to get the train file? #25

Open
jiinhui opened this issue Jul 10, 2024 · 2 comments
Open

how to get the train file? #25

jiinhui opened this issue Jul 10, 2024 · 2 comments

Comments

@jiinhui
Copy link

jiinhui commented Jul 10, 2024

I can't find the train data files of "BLIVA/bliva/data/llava/bliva_llava_150k.json" and "BLIVA/bliva/data/ocrVQA/cleaned_train_dataset.json". Can you tell me how to download them? Thanks!

@gordonhu608
Copy link
Collaborator

For ocrVQA train data, you can refer to this issue, #12. The paper should mention we used a prompt "OCR tokens: {}" to add OCR tokens directly after the question. As for bliva_llava_150k, I think it's the version of converting llava150k to single-turn chat history. Check the details in paper.

@jiinhui
Copy link
Author

jiinhui commented Jul 11, 2024

For ocrVQA train data, you can refer to this issue, #12. The paper should mention we used a prompt "OCR tokens: {}" to add OCR tokens directly after the question. As for bliva_llava_150k, I think it's the version of converting llava150k to single-turn chat history. Check the details in paper.

I can't find the details about converting llava150k to single-turn chat in your paper. I will try to review InstructBLIP for more details about the dataset.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants