OPT Finetune CreateStateParallel KeyError with PipeshardParallel #822

zw123han · 2022-12-20T06:31:57Z

Hello,

I'm tuning OPT using https://github.com/alpa-projects/alpa/tree/main/examples/opt_finetune with FollowParallel and CreateStateParallel to offload the peak CPU memory to my devices. This should also resolve issue #811

However, when method contains PipeshardParallel the following error is displayed. The same error is not observed when PipeshardParallel degenerates into ShardParallel (i.e. pipeline_parallel=1), nor with data_parallel + operator_parallel only.

To Reproduce

I attach my modified run_clm_flax.py (relevant lines: 725, 829). You can directly replace the version from https://github.com/alpa-projects/alpa/tree/main/examples/opt_finetune and reproduce the error with the existing run_2.7b_pipe.sh launch script.

https://github.com/zw123han/alpa/blob/main/examples/opt_finetune/run_clm_flax.py

Aside, do you think it would be helpful to merge this as the default OPT finetune script once stabilized to avoid CPU bottleneck for large models?

The text was updated successfully, but these errors were encountered:

merrymercy · 2023-02-13T03:52:35Z

solved by #873.
See also #858

zw123han changed the title ~~[OPT Finetune] CreateStateParallel KeyError with PipeshardParallel~~ [FEATURE/BUG] OPT Finetune CreateStateParallel KeyError with PipeshardParallel Dec 20, 2022

zw123han changed the title ~~[FEATURE/BUG] OPT Finetune CreateStateParallel KeyError with PipeshardParallel~~ OPT Finetune CreateStateParallel KeyError with PipeshardParallel Dec 20, 2022

merrymercy mentioned this issue Jan 3, 2023

RayOutOfMemoryError in RAM during OPT finetune #811

Closed

merrymercy closed this as completed Feb 13, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

OPT Finetune CreateStateParallel KeyError with PipeshardParallel #822

OPT Finetune CreateStateParallel KeyError with PipeshardParallel #822

zw123han commented Dec 20, 2022 •

edited

Loading

merrymercy commented Feb 13, 2023

OPT Finetune CreateStateParallel KeyError with PipeshardParallel #822

OPT Finetune CreateStateParallel KeyError with PipeshardParallel #822

Comments

zw123han commented Dec 20, 2022 • edited Loading

To Reproduce

merrymercy commented Feb 13, 2023

zw123han commented Dec 20, 2022 •

edited

Loading