Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[RLlib] Add support for multi-agent off-policy algorithms in the new API stack. #45182
[RLlib] Add support for multi-agent off-policy algorithms in the new API stack. #45182
Changes from all commits
baa1398
a1eb1f9
6538b58
683f515
366a4b9
a8b2d0c
f76628a
81421d9
bd54d5a
6ee006f
a345d09
b77fd5a
6e11ff6
b39b9a8
e6cf4f7
eebc04d
d12f16f
2247c02
827adda
1e67ccf
c748df8
fc35faa
c336ac8
6409007
81c3893
c522597
b8fbe19
feafb6b
d2f9030
2fd7717
a3416a8
2296cfc
8582ad9
c8d72fa
ffbf3de
47888a4
7d6497e
cccd48d
e96b9ce
9d409dd
41d0b18
File filter
Filter by extension
Conversations
Jump to
There are no files selected for viewing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a good point. There are still some "weird" assumptions left in some connectors' logic.
We should comb these out and make the logic when to go into what loop with SA- or MAEps more clear.
Some of this stuff has to do with the fact that EnvRunners can either have a SingleAgentRLModule OR a MultiAgentRLModule, but Learners always(!) have a MultiAgentModule. Maybe we should have Learners that operate on SingleAgentRLModules for simplicity and more transparency. It shouldn't be too hard to fix that on the Learner side.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We'll have to see. This might lead to Tune errors in the sense that at the beginning, if no episode is done yet, Tune will complain that none of the stop criteria (e.g. num_env_steps_sampled_lifetime) can be found in the result dict.