- VisInContext is a easy way to increase the in-context text length in Multi-modality Learning.
- This work is also complement with existing works to increase in-context text length like FlashAttn, Memory Transformer.
pip install -r requirement.txt
For H100 GPUS, run the following dependencies:
pip install -r requirements_h100.txt
See DATASET.md.
See PRETRAIN.md.
See Evaluation.md
If you find our work helps, please consider cite the following work
@article{wang2024visincontext,
title={Leveraging Visual Tokens for Extended Text Contexts in Multi-Modal Learning},
author={Wang, Alex Jinpeng and Li, Linjie and Lin, Yiqi and Li, Min and Wang, Lijuan and Shou, Mike Zheng},
journal={NeurIPS},
year={2024}
}
Email: awinyimgprocess at gmail dot com
Thanks for these good works. Open-flamingo, Open-CLIP and WebDataset.