-
Notifications
You must be signed in to change notification settings - Fork 5.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[RLlib] AlgorithmConfig: Mostly dissolve resources()
settings (e.g. num_learner_workers) into learners()
and env_runners()
methods.
#45376
Conversation
…rithm_config_dissolve_resources_method
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Makes a lot of things clearer and gives a clean separation of the different classes . What needs to be emphasized more clearly imo is that a single learner can use multiple GPUs and for what num_gpu
is correctly used (local envrunner, local learner, etc.).
num_gpus_per_learner_worker=0, # <- set this to 1, if you have at least 1 GPU | ||
num_cpus_for_local_worker=1, | ||
# `num_learners` to the number of available GPUs for multi-GPU training (and | ||
# `num_gpus_per_learner=1`). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If each learner can use 1 GPU only, the argument num_gpus_per_learner
is imo a bit off. Instead we might better use sth like learner_on_gpu=True/False
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not a necessary limit and we won't error out if the user sets it to >1. It could be for example that they have a model that doesn't fit on one GPU. We should leave this option flexible for future pipelining/model-parallelism support, I think.
doc/source/rllib/rllib-learner.rst
Outdated
num_learner_workers=0 # Set this to greater than 1 to allow for DDP style | ||
# updates. | ||
.learners( | ||
num_learners=0 # Set this to greater than 1 to allow for DDP style updates. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this also work mwith data distributed learning, i.e. using a large batch and distributing it among different learner - still synchronous?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's the idea, yes. If you do num_learners > 1
, right now, RLlib automatically performs DDP-style training, splitting the batch automatically into n shards and sending them (synchronized) to the n Learners.
# Use `num_cpus_for_local_worker` and `num_gpus` for the local worker and | ||
# `num_cpus_per_worker` and `num_gpus_per_worker` for the remote | ||
# workers to determine their CPU/GPU resource needs. | ||
# Use `num_cpus_for_main_process` and `num_gpus` for the local worker and |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Users might ask, what if I run the learner on the main process - do I need num_gpus
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good question, I haven't really thought about this. We should definitely retire this option. It's misleading.
If you want PPO for example to learn on the local Learner worker (num_learners=0) AND use the GPU, I guess, we should test, whether this already works with num_gpus_per_learner=1
. This should, imo, allocate the 1 GPU together with the num_cpus_for_main_process
CPUs in the placement group bundle. Let me check ...
rllib/algorithms/algorithm_config.py
Outdated
For example, an Algorithm with 2 EnvRunners and 1 Learner (with | ||
1 GPU) will request a placement group with the bundles: | ||
[{"cpu": 1}, {"gpu": 1, "cpu": 1}, {"cpu": 1}, {"cpu": 1}], where the | ||
first bundle is for the local (main Algorithm) process, the secon one |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
secon - > second
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
For multi-gpu training, you have to set `num_learners` to > 1 and set | ||
`num_gpus_per_learner` accordingly (e.g. 4 GPUs total and model fits on | ||
1 GPU: `num_learners=4; num_gpus_per_learner=1` OR 4 GPUs total and | ||
model requires 2 GPUs: `num_learners=2; num_gpus_per_learner=2`). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This needs to be emphasized more clearly in the other sections where these arguments are mentioned and in the docs: A single learner can use multiple GPUs.
AlgorithmConfig: Mostly dissolve
resources()
settings (e.g. num_learner_workers) intolearners()
andenv_runners()
methods.This will increase clarity of the AlgorithmConfig semantic structure, providing a clearer separation between "Learner workers" and "EnvRunner workers", which were previously sometimes both referred to as "workers".
Why are these changes needed?
Related issue number
Checks
git commit -s
) in this PR.scripts/format.sh
to lint the changes in this PR.method in Tune, I've added it in
doc/source/tune/api/
under thecorresponding
.rst
file.