Hi👋, i'm a PhD candidate (2021.09-now) in Tianjin University, China. My major interests include video understanding, sign language understanding and multi-modal learning. I'd like to let the people benefit more from general computer vision techniques. For more information, please visit www.hulianyu.top. Feel free to contact me via hly2021@tju.edu.cn.
-
We release iLLaVA, an efficient method for large vision language models by merging visual tokens. It could achieve about 2× throughput and 1.7× - 2× memory reduction with comparable performance through merging redundant visual tokens in some certain layers.
-
We release Deep Correletaed Prompting, which tackles the missing-modality scenarios by proposing three different types of prompting approaches, largely improving the robustness of large vision-language models.
-
We release CorrNet+, an unified model with superior performance on both continuous sign language recognition and sign language translation tasks by using only RGB inputs.
-
We release DSTA-SLR, which performs sign language recognition (SLR) with pure skeleton inputs but ahcieves comparable accuracy and much faster speed than recognition with RGB inputs.