Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

第九章策略梯度的损失函数 #79

Open
mgt-lya opened this issue Jun 24, 2024 · 1 comment
Open

第九章策略梯度的损失函数 #79

mgt-lya opened this issue Jun 24, 2024 · 1 comment

Comments

@mgt-lya
Copy link

mgt-lya commented Jun 24, 2024

REINFORCE代码实现那里,loss=-log_prob*G应该是损失函数的梯度才对,为什么代码里直接把它当成了损失函数然后用backward()求梯度再更新啊

@Hosen760
Copy link

image
这段话中说错了,应当将梯度符号去掉,这样就是损失函数,对其反向传播求梯度来更新参数即可。
但是代码的实现是对的。

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants