tensor parallelism across multiple GPU's #1092

samanthvishwas · 2023-09-18T06:20:50Z

I am following the code as mentioned in the AWS documentation to host GPT-J-6B using DJL serving

[ https://github.com/aws/amazon-sagemaker-examples/blob/main/advanced_functionality/pytorch_deploy_large_GPT_model/GPT-J-6B-model-parallel-inference-DJL.ipynb]
Providing a tensor parallelism value as 2 in serving.properties creates 2 copies of the model rather than partitioning model layers across two GPU's . This happens irrespective of using a smaller/larger model.

Instance used :
ml.g4dn.12xlarge

frankfliu · 2023-09-21T00:30:25Z

@samanthvishwas
g4dn.12x has 4 gpus, DJLServing will automatically expend to all GPUs. So you see 2 workers (each worker use 2GPUs)

If you only want to load one copy of the model, you can set the following in serving.properties:

load_on_devices=0

sindhuvahinis · 2024-09-03T22:50:24Z

@samanthvishwas Closing the issue. Please open a new one, if you have any more questions.

sindhuvahinis closed this as completed Sep 3, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

tensor parallelism across multiple GPU's #1092

tensor parallelism across multiple GPU's #1092

samanthvishwas commented Sep 18, 2023 •

edited

Loading

frankfliu commented Sep 21, 2023 •

edited

Loading

sindhuvahinis commented Sep 3, 2024

tensor parallelism across multiple GPU's #1092

tensor parallelism across multiple GPU's #1092

Comments

samanthvishwas commented Sep 18, 2023 • edited Loading

frankfliu commented Sep 21, 2023 • edited Loading

sindhuvahinis commented Sep 3, 2024

samanthvishwas commented Sep 18, 2023 •

edited

Loading

frankfliu commented Sep 21, 2023 •

edited

Loading