-
-
Notifications
You must be signed in to change notification settings - Fork 40
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add test that Gymnasium and MO-Gymnasium envs match #90
Add test that Gymnasium and MO-Gymnasium envs match #90
Conversation
@ffelten or @LucasAlegre It looks like most of the mo-gymnasium reward vectors do not by default match the gymnasium versions. Is this expected? |
Lol, just what I said in my comment. Some environments are indeed different, for example, we usually remove the scaling factors because they are unnecessary since the rewards are not aggregated into a single scalar. Another example is environments that add an extra component to the reward, e.g. mountaincar penalizes for changing directions. EDIT: still, I think these kinds of tests are relevant, I'd just ignore the reward tests. |
Amazing, I'm glad this isn't actually an issue then. |
Yes, I believe for most of them (if not all) that is indeed possible. I will try to document this somewhere, any suggestion? |
In the documentation? Are you aware that |
I will add to the pydoc of each environment class.
Yes, Reacher is probably the environment with the most changes from the original. We also changed .xml to add more targets to the environment. This one is not possible to map to Reacher from Gymnasium |
Thanks, this will be really helpful for me because I want to learn the equivalent reward vector version of the standard agent which requires that the mo version is equivalent to the standard version |
@pseudo-rnd-thoughts see: I could not map the rewards for Humanoid and Walker2d because we added the healthy_reward to all objectives, and then we can not scale it separately. What we could do is create versions of them with the healthy_reward modeled as a separate objective. What do you think? @ffelten |
I find it odd to create environments for the sake of testing lol |
It is not only for testing, since I believe the relative weighting of the healthy_reward might also induce new trade-offs. I saw some papers using it. The reason I did not include it was to focus on the velocity/energy trade-off, which is clearer. |
I figure it out how to scale the healthy_reward so that Humanoid and Walker2d can be mapped to the original Gymnasium env. But this will make the current results not reproducible, so I will implement these changes on v5 (PR #85). |
@ffelten or @LucasAlegre can we merge this PR and the reward vector equivalent be in another PR? |
This PR adds a test for all MO-Gymnasium envs that are also contained in Gymnasium to see if the two are still equivalent using the
check_environments_match
function