Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Question on HW1 #4

Open
JiaheXu opened this issue Aug 7, 2021 · 1 comment
Open

Question on HW1 #4

JiaheXu opened this issue Aug 7, 2021 · 1 comment

Comments

@JiaheXu
Copy link

JiaheXu commented Aug 7, 2021

In HW1 MLP_policy.py
if discrete is true, I think the out put should be a one-hot vector, or at least when you take actions, you need to take the argmax one in utils.py. I am a nooby in RL, I am not 100% sure, please take a look. In HW1 all problems are continuous, perhaps thats' why your code works.

@mantle2048
Copy link

mantle2048 commented Sep 19, 2021

Hi, I quite agree with you.

Thanks for the author's great code for CS285 2020Fall homework!

There is a small problem in hw1.

In hw1 cs285/policies/MLP_policy.py, the author used the deterministic policy (directly through self.mean_tet to output actions).

This is incorrect in that we can see that self.logstd is set in the original code cs285/policies/MLP_policy.py, which is part of the stochastic policy.

In addition, I found that after modifying the author's code from deterministic policy to stochastic policy, the performance of BC in Ant -v2 is reduced from 4k to 1.4k.

I think 1.4k is what BC should perform.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants