You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hello,
I'm deploying Vicuna-33b over 2xA6000s(48gb each) using TP 2.
It consumes about ~37gb on each card.
I have a new scenario where there is a workload on gpu0 consuming 24gb vram.
Does Deepspeed currently do anything smart like a 25:75 tensor slicing to adapt to the 24+48gb gpu mem?
I see that tp_shard.py doesn't support any parameters but I wanted to confirm this behavior.
Thank you very much.
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
-
Hello,
I'm deploying Vicuna-33b over 2xA6000s(48gb each) using TP 2.
It consumes about ~37gb on each card.
I have a new scenario where there is a workload on gpu0 consuming 24gb vram.
Does Deepspeed currently do anything smart like a 25:75 tensor slicing to adapt to the 24+48gb gpu mem?
I see that tp_shard.py doesn't support any parameters but I wanted to confirm this behavior.
Thank you very much.
Beta Was this translation helpful? Give feedback.
All reactions