Multiple 400MB processes on single GPU #20114
Unanswered
changspencer
asked this question in
DDP / multi-GPU / multi-node
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Hello everyone!
I had a question for something I've been wondering about for DP/DDP behavior. Occasionally, for my runs on a SLURM cluster, I see multiple small 400 MB processes get placed on a single GPU. I assume this GPU is something like the "master" process, but I've no idea why the small processes are necessary or if they don't get cleaned up after a method finishes running.
What could be the reason I see the processes show up on a single GPU during training (although all training processes have been started)? Could this be a SLURM resource management problem or a (personal) programming problem?
Some quick notes:
I can try to provide more details, but I wanted to see if anyone else has experienced the same situation for - possibly - different use cases.
Beta Was this translation helpful? Give feedback.
All reactions