-
Notifications
You must be signed in to change notification settings - Fork 84
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
在使用连续动作空间时,输出的动作取值无法设置上下限 #9
Comments
网络输出并未基于环境的限制做缩放 |
谢谢师兄 |
感谢师兄百忙之中抽出时间回复我的问题,不胜感激。
在 2023-01-11 22:12:07,"tinyzqh" ***@***.***> 写道:
网络输出并未基于环境的限制做缩放
—
Reply to this email directly, view it on GitHub, or unsubscribe.
You are receiving this because you authored the thread.Message ID: ***@***.***>
|
你好,请问最后怎么设置的动作上下限范围呢 |
我是在algorithms/utils/act.py这里输出的第83行后面对输出的正态分布概率再做一个tanh的激活函数,让它映射到[-1,1]之间。 |
不好意思,最近忙着毕业,才来得及回复你的消息,我是在algorithms/utils/act.py这里输出的第83行后面对输出的正态分布概率再做一个tanh的激活函数,让它映射到[-1,1]之间,然后在放大到所需要的区间上。 |
我是在env_runner.py里你想做截断那个地方加了这种归一化actions_env =(actions - np.min(actions)) / (np.max(actions) - np.min(actions)),不知道对不对,是不是actions和actions_env 都需要归一化,比如actions=(actions - np.min(actions)) / (np.max(actions) - np.min(actions)),actions_env =actions |
不清楚是不是要做归一化?但是看你的归一化代码好像是归一化到[0-1]?不知道符不符合你对输出的要求,我是听从了建议使用了tanh激活函数来实现[-1,1]的输出。当然我也调整了env_continuous.py文件中第30行代码,使u_action_space = spaces.Box(low=-1.0, high=1.0, shape=(self.signal_action_dim,), dtype=np.float32)。 |
请问修改之后能训练出来吗?我在边在连续动作空间训练的时候,没有加tanh函数,能看出是在学习,但是到后面不收敛,曲线波动极大 |
你好,我在这样映射之后,action在训练过程中出现了nan,想请教有没有解决的办法 |
请问解决了吗,我也遇到同样的问题 |
师兄您好,我在调整env_continuous.py文件中第30行代码u_action_space = spaces.Box(low=0.0, high=90.0, shape=(self.signal_action_dim,), dtype=np.float32)后,action取值并没有限制在0和90之间,请问师兄这是为什么呢?谢谢师兄了。
The text was updated successfully, but these errors were encountered: