Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sharding a model checkpoint for deepspeed usage #39

Open
CoderPat opened this issue Dec 5, 2022 · 3 comments
Open

Sharding a model checkpoint for deepspeed usage #39

CoderPat opened this issue Dec 5, 2022 · 3 comments

Comments

@CoderPat
Copy link

CoderPat commented Dec 5, 2022

Hey!
I'm using a custom version of this repo to run BLOOM-175B with DeepSpeed and it works great, thank you for this!
I was thinking of exploring using large models (such as OPT-175B) and was wondering what is the process for creating a pre-sharded, int8 deepspeed checkpoint for it, similar to https://huggingface.co/microsoft/bloom-deepspeed-inference-int8
Is there any documentation available or example scripts for this?

@mayank31398
Copy link
Collaborator

I am unsure about OPT's compatibility with deepspeed.
But if it works, you can simply pass save_mp_checkpoint_path parameter to init_inference method.
This will create a pre-sharded fp16 version (assuming it works :) )

For generating int8 weights (pre-sharded), look at https://github.com/microsoft/DeepSpeedExamples/blob/master/model_compression/gpt2/bash_script/run_zero_quant.sh

This scripts generates a quantized version of gpt2, but it is QAT and requires training.
I haven't personally tried this though.

@mayank31398
Copy link
Collaborator

Also watch out for #37

@mayank31398
Copy link
Collaborator

If you don't have memory constraints (number of GPUs), I will encourage you to use fp16 since it is faster.
int8/int4 will be much faster once DeepSpeed starts supporting their kernels.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants