-
Notifications
You must be signed in to change notification settings - Fork 353
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
OPT Finetune CreateStateParallel KeyError with PipeshardParallel #822
Comments
zw123han
changed the title
[OPT Finetune] CreateStateParallel KeyError with PipeshardParallel
[FEATURE/BUG] OPT Finetune CreateStateParallel KeyError with PipeshardParallel
Dec 20, 2022
zw123han
changed the title
[FEATURE/BUG] OPT Finetune CreateStateParallel KeyError with PipeshardParallel
OPT Finetune CreateStateParallel KeyError with PipeshardParallel
Dec 20, 2022
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Hello,
I'm tuning OPT using https://github.com/alpa-projects/alpa/tree/main/examples/opt_finetune with
FollowParallel
andCreateStateParallel
to offload the peak CPU memory to my devices. This should also resolve issue #811However, when
method
contains PipeshardParallel the following error is displayed. The same error is not observed when PipeshardParallel degenerates into ShardParallel (i.e.pipeline_parallel=1
), nor with data_parallel + operator_parallel only.To Reproduce
I attach my modified
run_clm_flax.py
(relevant lines:725, 829
). You can directly replace the version from https://github.com/alpa-projects/alpa/tree/main/examples/opt_finetune and reproduce the error with the existingrun_2.7b_pipe.sh
launch script.https://github.com/zw123han/alpa/blob/main/examples/opt_finetune/run_clm_flax.py
Aside, do you think it would be helpful to merge this as the default OPT finetune script once stabilized to avoid CPU bottleneck for large models?
The text was updated successfully, but these errors were encountered: