第十三章 DDPG算法代码实践中的一点疏漏 #75

xiyanzzz · 2024-03-29T15:09:56Z

13.3 DDPG 代码实践中，在定义的DDPG类中，方法def take_action(self, state):的返回动作应该加上截断。
return action -> return np.clip(action, -self.action_bound, self.action_bound)
该动作会用于Q网络对当前时间步的q值估计，动作不应大于环境的限制（添加的噪声会导致这种情况发生，尽管概率很小）。

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

第十三章 DDPG算法代码实践中的一点疏漏 #75

第十三章 DDPG算法代码实践中的一点疏漏 #75

xiyanzzz commented Mar 29, 2024

第十三章 DDPG算法 代码实践中的一点疏漏 #75

第十三章 DDPG算法 代码实践中的一点疏漏 #75

Comments

xiyanzzz commented Mar 29, 2024

第十三章 DDPG算法代码实践中的一点疏漏 #75

第十三章 DDPG算法代码实践中的一点疏漏 #75