You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
While running a TRPO train, after some time (random - anywhere from 15sec to 1min) it kicks with the following: Traceback (most recent call last): File "callback.py", line 196, in <module> model.learn(total_timesteps=time_steps, callback=callback, tb_log_name=tb_sub_dir) File "/root/stable-baselines/stable_baselines/trpo_mpi/trpo_mpi.py", line 427, in learn self.vfadam.update(grad, self.vf_stepsize) File "/root/stable-baselines/stable_baselines/common/mpi_adam.py", line 61, in update step = (- step_size) * self.exp_avg / (np.sqrt(self.exp_avg_sq) + self.epsilon) FloatingPointError: underflow encountered in multiply
Using the recent version, 2.9.0, Python 3.7.5.
The text was updated successfully, but these errors were encountered:
Training a custom Gym env with TRPO. After some time (random - anywhere from 30sec to 3 min) it kicks with the following traceback.
The error occurs only with TRPO. Using same code/environment/gym with another RL strategy completes successfully.
Tried the code below on CartPole-v1. Yet it does not cause an error (maybe because it's an easy one).
Traceback (most recent call last): File "callback.py", line 196, in <module> model.learn(total_timesteps=time_steps, callback=callback, tb_log_name=tb_sub_dir) File "/root/stable-baselines/stable_baselines/trpo_mpi/trpo_mpi.py", line 427, in learn self.vfadam.update(grad, self.vf_stepsize) File "/root/stable-baselines/stable_baselines/common/mpi_adam.py", line 61, in update step = (- step_size) * self.exp_avg / (np.sqrt(self.exp_avg_sq) + self.epsilon) FloatingPointError: underflow encountered in multiply
In the begining it seems the code starts fine. Yet, at some points it goes into "silent loop", without any updates on the console, as if as it's frozen. The only way to reveal and force it to spit the error is by adding to the top of stable-baselines\stable_baselines\trpo_mpi\utils.py,
after the line "import numpy as np", the following: np.seterr(all='raise')
Code example
from stable_baselines import TRPO #DQN, PPO2, A2C, ACKTR,
import tensorflow.compat.v1.logging as tflogging ; tflogging.set_verbosity(tflogging.ERROR) # supress tf warnings
import gym,
import numpy as np
np.seterr(all='raise')
env = gym.make('Myrl-v0')
model = TRPO('MlpPolicy', env, verbose=0)
model.learn(total_timesteps=900000)
System Info
Using the recent version, 2.9.0, Python 3.7.5.
Windows 10
TF 1.15
no GPU
installed via git and then "pip install -e ."
While running a TRPO train, after some time (random - anywhere from 15sec to 1min) it kicks with the following:
Traceback (most recent call last): File "callback.py", line 196, in <module> model.learn(total_timesteps=time_steps, callback=callback, tb_log_name=tb_sub_dir) File "/root/stable-baselines/stable_baselines/trpo_mpi/trpo_mpi.py", line 427, in learn self.vfadam.update(grad, self.vf_stepsize) File "/root/stable-baselines/stable_baselines/common/mpi_adam.py", line 61, in update step = (- step_size) * self.exp_avg / (np.sqrt(self.exp_avg_sq) + self.epsilon) FloatingPointError: underflow encountered in multiply
Using the recent version, 2.9.0, Python 3.7.5.
The text was updated successfully, but these errors were encountered: