[RLlib] Add support for multi-agent off-policy algorithms in the new API stack. #45182

simonsays1980 · 2024-05-07T15:55:16Z

Why are these changes needed?

Off-policy algorithms moved from old to the new stack and worked so far only in single-agent mode. We were missing a standard Learner API for the new stack which is now available: Any LearnerGroup receives now List[EpisodeType] for updates.

This PR adds the support for multi-agent setups in off-policy algorithms using the new MultiAgentEpisodeReplayBuffer. This PR includes all necessary modifications for "independent" sampling and includes an example for SAC to be added to the learning_tests.

Related issue number

Checks

I've signed off every commit(by using the -s flag, i.e., git commit -s) in this PR.
I've run scripts/format.sh to lint the changes in this PR.
I've included any doc changes needed for https://docs.ray.io/en/master/.
- I've added any new APIs to the API Reference. For example, if I added a
  method in Tune, I've added it in doc/source/tune/api/ under the
  corresponding .rst file.
I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/
Testing Strategy
- Unit tests
- Release tests
- This PR is not tested :(

Signed-off-by: sven1977 <svenmika1977@gmail.com>

…ge_episode_buffers_to_return_episode_lists_from_sample

Signed-off-by: sven1977 <svenmika1977@gmail.com>

…hat held DQN off from learning. In addition fixed some minor bugs. Signed-off-by: Simon Zehnder <simon.zehnder@gmail.com>

Signed-off-by: Simon Zehnder <simon.zehnder@gmail.com>

…ists_from_sample Signed-off-by: Simon Zehnder <simon.zehnder@gmail.com>

Signed-off-by: Simon Zehnder <simon.zehnder@gmail.com>

…rror occurred in CI tests. Signed-off-by: Simon Zehnder <simon.zehnder@gmail.com>

Signed-off-by: Simon Zehnder <simon.zehnder@gmail.com>

…ists_from_sample Signed-off-by: Simon Zehnder <simon.zehnder@gmail.com>

…r readability of the test code for users (we describe the connector to add the 'NEXT_OBS' to the batch). Signed-off-by: Simon Zehnder <simon.zehnder@gmail.com>

…ependent'-mode sampling. Added multi-agent example for SAC and modified 'compute_gradients' in 'SACTorchLearner' to deal with MARLModules. Commented 2 assertions in connectors that avoided multi-agent setups with 'SingleAgentEpisode's. Signed-off-by: Simon Zehnder <simon.zehnder@gmail.com>

rllib/algorithms/sac/torch/sac_torch_learner.py

rllib/connectors/common/agent_to_module_mapping.py

sven1977 · 2024-05-08T11:58:42Z

rllib/connectors/common/batch_individual_items.py

@@ -33,7 +33,10 @@ def __call__(
            # to a batch structure of:
            # [module_id] -> [col0] -> [list of items]
            if is_marl_module and column in rl_module:
-                assert is_multi_agent
+                # assert is_multi_agent
+                # TODO (simon, sven): Check, if we need for other cases this check.


This is a good point. There are still some "weird" assumptions left in some connectors' logic.
We should comb these out and make the logic when to go into what loop with SA- or MAEps more clear.

Some of this stuff has to do with the fact that EnvRunners can either have a SingleAgentRLModule OR a MultiAgentRLModule, but Learners always(!) have a MultiAgentModule. Maybe we should have Learners that operate on SingleAgentRLModules for simplicity and more transparency. It shouldn't be too hard to fix that on the Learner side.

rllib/tuned_examples/sac/multi_agent_pendulum_sac_envrunner.py

rllib/utils/replay_buffers/utils.py

rllib/utils/replay_buffers/prioritized_episode_replay_buffer.py

…sode' Signed-off-by: Simon Zehnder <simon.zehnder@gmail.com>

Signed-off-by: Simon Zehnder <simon.zehnder@gmail.com>

…agent off-policy algorithms. Signed-off-by: Simon Zehnder <simon.zehnder@gmail.com>

sven1977 · 2024-05-13T10:06:09Z

rllib/env/multi_agent_env_runner.py

-        # If no episodes at all, log NaN stats.
-        if len(self._done_episodes_for_metrics) == 0:
-            self._log_episode_metrics(np.nan, np.nan, np.nan)
+        # TODO (simon): This results in hundreds of warnings in the logs


We'll have to see. This might lead to Tune errors in the sense that at the beginning, if no episode is done yet, Tune will complain that none of the stop criteria (e.g. num_env_steps_sampled_lifetime) can be found in the result dict.

sven1977

LGTM now.

I do have one concern about removing the NaN from the MultiAgentEnvRunner, but we can move it back or find a better solution (maybe initialize the most common stop keys already in algo) later.

rllib/BUILD

Signed-off-by: Sven Mika <sven@anyscale.io>

Signed-off-by: Simon Zehnder <simon.zehnder@gmail.com>

…footprint of the class. Changes to 'MultiAgentEpisodeReplayBuffer' to reduce memory usage and increase performance. Signed-off-by: Simon Zehnder <simon.zehnder@gmail.com>

Signed-off-by: Simon Zehnder <simon.zehnder@gmail.com>

…sed. Signed-off-by: Simon Zehnder <simon.zehnder@gmail.com>

Signed-off-by: Simon Zehnder <simon.zehnder@gmail.com>

…irection single-agent buffer. Memory leak should be fixed with this commit. Signed-off-by: Simon Zehnder <simon.zehnder@gmail.com>

Signed-off-by: Simon Zehnder <simon.zehnder@gmail.com>

…:simonsays1980/ray into change_ma_buffer_to_use_list_of_episodes Signed-off-by: Simon Zehnder <simon.zehnder@gmail.com>

…ge_ma_buffer_to_use_list_of_episodes

Signed-off-by: sven1977 <svenmika1977@gmail.com>

Signed-off-by: Simon Zehnder <simon.zehnder@gmail.com>

…:simonsays1980/ray into change_ma_buffer_to_use_list_of_episodes Signed-off-by: Simon Zehnder <simon.zehnder@gmail.com>

sven1977 and others added 17 commits April 29, 2024 13:28

wip

baa1398

Signed-off-by: sven1977 <svenmika1977@gmail.com>

wip

a1eb1f9

Signed-off-by: sven1977 <svenmika1977@gmail.com>

fixes

6538b58

Signed-off-by: sven1977 <svenmika1977@gmail.com>

Merge branch 'master' of https://github.com/ray-project/ray into chan…

683f515

…ge_episode_buffers_to_return_episode_lists_from_sample

wip

366a4b9

Signed-off-by: sven1977 <svenmika1977@gmail.com>

wip

a8b2d0c

Signed-off-by: sven1977 <svenmika1977@gmail.com>

merge

f76628a

Signed-off-by: sven1977 <svenmika1977@gmail.com>

Fixed a bug with 'TERMINATEDS/TRUNCATEDS' in replay buffer sampling t…

81421d9

…hat held DQN off from learning. In addition fixed some minor bugs. Signed-off-by: Simon Zehnder <simon.zehnder@gmail.com>

LINTER.

bd54d5a

Signed-off-by: Simon Zehnder <simon.zehnder@gmail.com>

Added docs to new 'sample' method and removed old sample methods.

6ee006f

Signed-off-by: Simon Zehnder <simon.zehnder@gmail.com>

Merge branch 'master' into change_episode_buffers_to_return_episode_l…

a345d09

…ists_from_sample Signed-off-by: Simon Zehnder <simon.zehnder@gmail.com>

Replaced 'td_error' by 'TD_ERROR_KEY'.

b77fd5a

Signed-off-by: Simon Zehnder <simon.zehnder@gmail.com>

Needed to define 'TD_ERROR_KEY' in 'replay_buffer.utils' b/c import e…

6e11ff6

…rror occurred in CI tests. Signed-off-by: Simon Zehnder <simon.zehnder@gmail.com>

Fixed a small bug in test code.

b39b9a8

Signed-off-by: Simon Zehnder <simon.zehnder@gmail.com>

Merge branch 'master' into change_episode_buffers_to_return_episode_l…

e6cf4f7

…ists_from_sample Signed-off-by: Simon Zehnder <simon.zehnder@gmail.com>

Interchanged 'new_obs' with our constant 'Columns.NEXT_OBS' for bette…

eebc04d

…r readability of the test code for users (we describe the connector to add the 'NEXT_OBS' to the batch). Signed-off-by: Simon Zehnder <simon.zehnder@gmail.com>

sven1977 reviewed May 8, 2024

View reviewed changes

rllib/algorithms/sac/torch/sac_torch_learner.py Show resolved Hide resolved

sven1977 reviewed May 8, 2024

View reviewed changes

rllib/algorithms/sac/torch/sac_torch_learner.py Outdated Show resolved Hide resolved

sven1977 reviewed May 8, 2024

View reviewed changes

rllib/connectors/common/agent_to_module_mapping.py Outdated Show resolved Hide resolved

sven1977 reviewed May 8, 2024

View reviewed changes

rllib/tuned_examples/sac/multi_agent_pendulum_sac_envrunner.py Outdated Show resolved Hide resolved

sven1977 reviewed May 8, 2024

View reviewed changes

rllib/utils/replay_buffers/utils.py Outdated Show resolved Hide resolved

sven1977 reviewed May 8, 2024

View reviewed changes

rllib/utils/replay_buffers/prioritized_episode_replay_buffer.py Outdated Show resolved Hide resolved

simonsays1980 added 5 commits May 8, 2024 19:02

Changed 'truncated/terminated' logic in 'MultiEnv' and 'MultiAgentEpi…

2247c02

…sode' Signed-off-by: Simon Zehnder <simon.zehnder@gmail.com>

Switched back to 'pid'.

827adda

Signed-off-by: Simon Zehnder <simon.zehnder@gmail.com>

Commented out NaN metrics b/c they produced hindreds of warnings.

1e67ccf

Signed-off-by: Simon Zehnder <simon.zehnder@gmail.com>

Changed comment.

c748df8

Signed-off-by: Simon Zehnder <simon.zehnder@gmail.com>

Little changes here and there and to clean-up sample logic and multi-…

fc35faa

…agent off-policy algorithms. Signed-off-by: Simon Zehnder <simon.zehnder@gmail.com>

sven1977 changed the title ~~[RLlib] - Add support for multi-agent off-policy algorithms in the new API stack~~ [RLlib] Add support for multi-agent off-policy algorithms in the new API stack. May 10, 2024

sven1977 requested review from avnishn, ArturNiederfahrenhorst, maxpumperla and kouroshHakha as code owners May 13, 2024 10:01

sven1977 reviewed May 13, 2024

View reviewed changes

sven1977 approved these changes May 13, 2024

View reviewed changes

sven1977 reviewed May 14, 2024

View reviewed changes

rllib/BUILD Outdated Show resolved Hide resolved

sven1977 and others added 10 commits May 14, 2024 11:07

Apply suggestions from code review

feafb6b

Signed-off-by: Sven Mika <sven@anyscale.io>

Merge branch 'master' of https://github.com/ray-project/ray

d2f9030

Signed-off-by: Simon Zehnder <simon.zehnder@gmail.com>

Added slots to 'MultiAgentEpisode' which should help reducing memory …

2fd7717

…footprint of the class. Changes to 'MultiAgentEpisodeReplayBuffer' to reduce memory usage and increase performance. Signed-off-by: Simon Zehnder <simon.zehnder@gmail.com>

Merge branch 'master' of https://github.com/ray-project/ray

a3416a8

Signed-off-by: Simon Zehnder <simon.zehnder@gmail.com>

Changed multi-agent SAC example such that at a minimum 2 agents are u…

2296cfc

…sed. Signed-off-by: Simon Zehnder <simon.zehnder@gmail.com>

Merge branch 'master' of https://github.com/ray-project/ray

8582ad9

Signed-off-by: Simon Zehnder <simon.zehnder@gmail.com>

Merge branch 'master' into change_ma_buffer_to_use_list_of_episodes

c8d72fa

Signed-off-by: Simon Zehnder <simon.zehnder@gmail.com>

Multiple performance tunings that bring the multi-agent buffer into d…

ffbf3de

…irection single-agent buffer. Memory leak should be fixed with this commit. Signed-off-by: Simon Zehnder <simon.zehnder@gmail.com>

LINTER.

47888a4

Signed-off-by: Simon Zehnder <simon.zehnder@gmail.com>

Merge branch 'change_ma_buffer_to_use_list_of_episodes' of github.com…

7d6497e

…:simonsays1980/ray into change_ma_buffer_to_use_list_of_episodes Signed-off-by: Simon Zehnder <simon.zehnder@gmail.com>

sven1977 enabled auto-merge (squash) May 16, 2024 14:55

github-actions bot added the go add ONLY when ready to merge, run all tests label May 16, 2024

sven1977 added 2 commits May 17, 2024 06:06

Merge branch 'master' of https://github.com/ray-project/ray into chan…

cccd48d

…ge_ma_buffer_to_use_list_of_episodes

test BAZEL printout

e96b9ce

Signed-off-by: sven1977 <svenmika1977@gmail.com>

github-actions bot disabled auto-merge May 17, 2024 04:07

sven1977 enabled auto-merge (squash) May 17, 2024 05:50

simonsays1980 added 2 commits May 17, 2024 16:41

Commented out off-policy multi-agent examples that were not learning.

9d409dd

Signed-off-by: Simon Zehnder <simon.zehnder@gmail.com>

Merge branch 'change_ma_buffer_to_use_list_of_episodes' of github.com…

41d0b18

…:simonsays1980/ray into change_ma_buffer_to_use_list_of_episodes Signed-off-by: Simon Zehnder <simon.zehnder@gmail.com>

github-actions bot disabled auto-merge May 17, 2024 14:42

sven1977 merged commit 7fb0ce1 into ray-project:master May 24, 2024
6 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[RLlib] Add support for multi-agent off-policy algorithms in the new API stack. #45182

[RLlib] Add support for multi-agent off-policy algorithms in the new API stack. #45182

simonsays1980 commented May 7, 2024 •

edited

Loading

sven1977 May 8, 2024

sven1977 May 13, 2024

sven1977 left a comment

[RLlib] Add support for multi-agent off-policy algorithms in the new API stack. #45182

[RLlib] Add support for multi-agent off-policy algorithms in the new API stack. #45182

Conversation

simonsays1980 commented May 7, 2024 • edited Loading

Why are these changes needed?

Related issue number

Checks

sven1977 May 8, 2024

Choose a reason for hiding this comment

sven1977 May 13, 2024

Choose a reason for hiding this comment

sven1977 left a comment

Choose a reason for hiding this comment

simonsays1980 commented May 7, 2024 •

edited

Loading