Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RLlib] Add support for multi-agent off-policy algorithms in the new API stack. #45182

Conversation

simonsays1980
Copy link
Collaborator

@simonsays1980 simonsays1980 commented May 7, 2024

Why are these changes needed?

Off-policy algorithms moved from old to the new stack and worked so far only in single-agent mode. We were missing a standard Learner API for the new stack which is now available: Any LearnerGroup receives now List[EpisodeType] for updates.

This PR adds the support for multi-agent setups in off-policy algorithms using the new MultiAgentEpisodeReplayBuffer. This PR includes all necessary modifications for "independent" sampling and includes an example for SAC to be added to the learning_tests.

Related issue number

Checks

  • I've signed off every commit(by using the -s flag, i.e., git commit -s) in this PR.
  • I've run scripts/format.sh to lint the changes in this PR.
  • I've included any doc changes needed for https://docs.ray.io/en/master/.
    • I've added any new APIs to the API Reference. For example, if I added a
      method in Tune, I've added it in doc/source/tune/api/ under the
      corresponding .rst file.
  • I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/
  • Testing Strategy
    • Unit tests
    • Release tests
    • This PR is not tested :(

sven1977 and others added 17 commits April 29, 2024 13:28
Signed-off-by: sven1977 <svenmika1977@gmail.com>
Signed-off-by: sven1977 <svenmika1977@gmail.com>
Signed-off-by: sven1977 <svenmika1977@gmail.com>
…ge_episode_buffers_to_return_episode_lists_from_sample
Signed-off-by: sven1977 <svenmika1977@gmail.com>
Signed-off-by: sven1977 <svenmika1977@gmail.com>
Signed-off-by: sven1977 <svenmika1977@gmail.com>
…hat held DQN off from learning. In addition fixed some minor bugs.

Signed-off-by: Simon Zehnder <simon.zehnder@gmail.com>
Signed-off-by: Simon Zehnder <simon.zehnder@gmail.com>
Signed-off-by: Simon Zehnder <simon.zehnder@gmail.com>
…ists_from_sample

Signed-off-by: Simon Zehnder <simon.zehnder@gmail.com>
Signed-off-by: Simon Zehnder <simon.zehnder@gmail.com>
…rror occurred in CI tests.

Signed-off-by: Simon Zehnder <simon.zehnder@gmail.com>
Signed-off-by: Simon Zehnder <simon.zehnder@gmail.com>
…ists_from_sample

Signed-off-by: Simon Zehnder <simon.zehnder@gmail.com>
…r readability of the test code for users (we describe the connector to add the 'NEXT_OBS' to the batch).

Signed-off-by: Simon Zehnder <simon.zehnder@gmail.com>
…ependent'-mode sampling. Added multi-agent example for SAC and modified 'compute_gradients' in 'SACTorchLearner' to deal with MARLModules. Commented 2 assertions in connectors that avoided multi-agent setups with 'SingleAgentEpisode's.

Signed-off-by: Simon Zehnder <simon.zehnder@gmail.com>
@@ -33,7 +33,10 @@ def __call__(
# to a batch structure of:
# [module_id] -> [col0] -> [list of items]
if is_marl_module and column in rl_module:
assert is_multi_agent
# assert is_multi_agent
# TODO (simon, sven): Check, if we need for other cases this check.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a good point. There are still some "weird" assumptions left in some connectors' logic.
We should comb these out and make the logic when to go into what loop with SA- or MAEps more clear.

Some of this stuff has to do with the fact that EnvRunners can either have a SingleAgentRLModule OR a MultiAgentRLModule, but Learners always(!) have a MultiAgentModule. Maybe we should have Learners that operate on SingleAgentRLModules for simplicity and more transparency. It shouldn't be too hard to fix that on the Learner side.

…sode'

Signed-off-by: Simon Zehnder <simon.zehnder@gmail.com>
Signed-off-by: Simon Zehnder <simon.zehnder@gmail.com>
Signed-off-by: Simon Zehnder <simon.zehnder@gmail.com>
Signed-off-by: Simon Zehnder <simon.zehnder@gmail.com>
…agent off-policy algorithms.

Signed-off-by: Simon Zehnder <simon.zehnder@gmail.com>
@sven1977 sven1977 changed the title [RLlib] - Add support for multi-agent off-policy algorithms in the new API stack [RLlib] Add support for multi-agent off-policy algorithms in the new API stack. May 10, 2024
# If no episodes at all, log NaN stats.
if len(self._done_episodes_for_metrics) == 0:
self._log_episode_metrics(np.nan, np.nan, np.nan)
# TODO (simon): This results in hundreds of warnings in the logs
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We'll have to see. This might lead to Tune errors in the sense that at the beginning, if no episode is done yet, Tune will complain that none of the stop criteria (e.g. num_env_steps_sampled_lifetime) can be found in the result dict.

Copy link
Contributor

@sven1977 sven1977 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM now.

I do have one concern about removing the NaN from the MultiAgentEnvRunner, but we can move it back or find a better solution (maybe initialize the most common stop keys already in algo) later.

rllib/BUILD Outdated Show resolved Hide resolved
sven1977 and others added 10 commits May 14, 2024 11:07
Signed-off-by: Sven Mika <sven@anyscale.io>
Signed-off-by: Simon Zehnder <simon.zehnder@gmail.com>
…footprint of the class. Changes to 'MultiAgentEpisodeReplayBuffer' to reduce memory usage and increase performance.

Signed-off-by: Simon Zehnder <simon.zehnder@gmail.com>
Signed-off-by: Simon Zehnder <simon.zehnder@gmail.com>
…sed.

Signed-off-by: Simon Zehnder <simon.zehnder@gmail.com>
Signed-off-by: Simon Zehnder <simon.zehnder@gmail.com>
Signed-off-by: Simon Zehnder <simon.zehnder@gmail.com>
…irection single-agent buffer. Memory leak should be fixed with this commit.

Signed-off-by: Simon Zehnder <simon.zehnder@gmail.com>
Signed-off-by: Simon Zehnder <simon.zehnder@gmail.com>
…:simonsays1980/ray into change_ma_buffer_to_use_list_of_episodes

Signed-off-by: Simon Zehnder <simon.zehnder@gmail.com>
@sven1977 sven1977 enabled auto-merge (squash) May 16, 2024 14:55
@github-actions github-actions bot added the go add ONLY when ready to merge, run all tests label May 16, 2024
Signed-off-by: sven1977 <svenmika1977@gmail.com>
@github-actions github-actions bot disabled auto-merge May 17, 2024 04:07
@sven1977 sven1977 enabled auto-merge (squash) May 17, 2024 05:50
Signed-off-by: Simon Zehnder <simon.zehnder@gmail.com>
…:simonsays1980/ray into change_ma_buffer_to_use_list_of_episodes

Signed-off-by: Simon Zehnder <simon.zehnder@gmail.com>
@github-actions github-actions bot disabled auto-merge May 17, 2024 14:42
@sven1977 sven1977 merged commit 7fb0ce1 into ray-project:master May 24, 2024
6 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
go add ONLY when ready to merge, run all tests
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants