Are there fine-tuning and inference scripts available for int4 quantization in bloom-7b? Is it possible to limit the GPU memory usage to within 10GB? #94

dizhenx · 2023-05-31T05:56:37Z

Where can I download bloom-7b?
I noticed that int8 quantization is available, but is there an option for int4 quantization?
What is the memory overhead for int4 and int8 when using LoRA or PTuning fine-tuning? Are there any fine-tuning scripts available?
Additionally, are there inference scripts available for int4 quantization? How much GPU memory is required for int4 and int8 inference, respectively?

mayank31398 · 2023-05-31T22:35:53Z

This is not possible.
But you might want to take a look at QLoRA paper: https://github.com/artidoro/qlora

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Are there fine-tuning and inference scripts available for int4 quantization in bloom-7b? Is it possible to limit the GPU memory usage to within 10GB? #94

Are there fine-tuning and inference scripts available for int4 quantization in bloom-7b? Is it possible to limit the GPU memory usage to within 10GB? #94

dizhenx commented May 31, 2023

mayank31398 commented May 31, 2023

Are there fine-tuning and inference scripts available for int4 quantization in bloom-7b? Is it possible to limit the GPU memory usage to within 10GB? #94

Are there fine-tuning and inference scripts available for int4 quantization in bloom-7b? Is it possible to limit the GPU memory usage to within 10GB? #94

Comments

dizhenx commented May 31, 2023

mayank31398 commented May 31, 2023