hulianyu hulianyuyy

We release iLLaVA, an efficient method for large vision language models by merging visual tokens. It could achieve about 2× throughput and 1.7× - 2× memory reduction with comparable performance through merging redundant visual tokens in some certain layers.
We release Deep Correletaed Prompting, which tackles the missing-modality scenarios by proposing three different types of prompting approaches, largely improving the robustness of large vision-language models.
We release CorrNet+, an unified model with superior performance on both continuous sign language recognition and sign language translation tasks by using only RGB inputs.
We release DSTA-SLR, which performs sign language recognition (SLR) with pure skeleton inputs but ahcieves comparable accuracy and much faster speed than recognition with RGB inputs.

Provide feedback