Custom tensor parallelism #5438

sarathkondeti · 2024-04-19T22:31:32Z

sarathkondeti
Apr 19, 2024

Hello,
I'm deploying Vicuna-33b over 2xA6000s(48gb each) using TP 2.
It consumes about ~37gb on each card.
I have a new scenario where there is a workload on gpu0 consuming 24gb vram.
Does Deepspeed currently do anything smart like a 25:75 tensor slicing to adapt to the 24+48gb gpu mem?
I see that tp_shard.py doesn't support any parameters but I wanted to confirm this behavior.
Thank you very much.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Custom tensor parallelism #5438

{{title}}

Replies: 0 comments

Select a reply

Custom tensor parallelism #5438

sarathkondeti Apr 19, 2024

Replies: 0 comments

sarathkondeti
Apr 19, 2024