The inference generation is very slow #16

Alkaiddd · 2024-02-27T08:27:07Z

The inference process is currently quite slow. Are there any methods available to accelerate it?
For action task, it costs about 9s for a sample.

melights · 2024-04-17T22:17:07Z

Hi Alkaiddd,
Thank you for your feedback! This model is not designed for real-time applications, and running inference with a 7B model does pose challenges, especially on less powerful GPUs. We have achieved inference times of 1-2 seconds per sample with batch inference on NVIDIA A100 GPUs. There's definitely room for improvement, such as quantization if you care about the inference speed.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

The inference generation is very slow #16

The inference generation is very slow #16

Alkaiddd commented Feb 27, 2024 •

edited

Loading

melights commented Apr 17, 2024

The inference generation is very slow #16

The inference generation is very slow #16

Comments

Alkaiddd commented Feb 27, 2024 • edited Loading

melights commented Apr 17, 2024

Alkaiddd commented Feb 27, 2024 •

edited

Loading