TRPO "underflow encountered in multiply" #59

jarlva · 2019-12-20T17:01:30Z

While running a TRPO train, after some time (random - anywhere from 15sec to 1min) it kicks with the following:
Traceback (most recent call last): File "callback.py", line 196, in <module> model.learn(total_timesteps=time_steps, callback=callback, tb_log_name=tb_sub_dir) File "/root/stable-baselines/stable_baselines/trpo_mpi/trpo_mpi.py", line 427, in learn self.vfadam.update(grad, self.vf_stepsize) File "/root/stable-baselines/stable_baselines/common/mpi_adam.py", line 61, in update step = (- step_size) * self.exp_avg / (np.sqrt(self.exp_avg_sq) + self.epsilon) FloatingPointError: underflow encountered in multiply

Using the recent version, 2.9.0, Python 3.7.5.

The text was updated successfully, but these errors were encountered:

araffin · 2019-12-20T17:05:03Z

Hello,
Please fill the issue template completely.

jarlva · 2019-12-21T18:15:11Z

Training a custom Gym env with TRPO. After some time (random - anywhere from 30sec to 3 min) it kicks with the following traceback.
The error occurs only with TRPO. Using same code/environment/gym with another RL strategy completes successfully.
Tried the code below on CartPole-v1. Yet it does not cause an error (maybe because it's an easy one).

Traceback (most recent call last): File "callback.py", line 196, in <module> model.learn(total_timesteps=time_steps, callback=callback, tb_log_name=tb_sub_dir) File "/root/stable-baselines/stable_baselines/trpo_mpi/trpo_mpi.py", line 427, in learn self.vfadam.update(grad, self.vf_stepsize) File "/root/stable-baselines/stable_baselines/common/mpi_adam.py", line 61, in update step = (- step_size) * self.exp_avg / (np.sqrt(self.exp_avg_sq) + self.epsilon) FloatingPointError: underflow encountered in multiply

In the begining it seems the code starts fine. Yet, at some points it goes into "silent loop", without any updates on the console, as if as it's frozen. The only way to reveal and force it to spit the error is by adding to the top of stable-baselines\stable_baselines\trpo_mpi\utils.py,
after the line "import numpy as np", the following: np.seterr(all='raise')

Code example
from stable_baselines import TRPO #DQN, PPO2, A2C, ACKTR,
import tensorflow.compat.v1.logging as tflogging ; tflogging.set_verbosity(tflogging.ERROR) # supress tf warnings

import gym,
import numpy as np
np.seterr(all='raise')

env = gym.make('Myrl-v0')
model = TRPO('MlpPolicy', env, verbose=0)
model.learn(total_timesteps=900000)

System Info
Using the recent version, 2.9.0, Python 3.7.5.
Windows 10
TF 1.15
no GPU
installed via git and then "pip install -e ."

araffin added the more information needed Please fill the issue template label Dec 20, 2019

araffin added custom gym env Issue related to Custom Gym Env and removed more information needed Please fill the issue template labels Dec 21, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

TRPO "underflow encountered in multiply" #59

TRPO "underflow encountered in multiply" #59

jarlva commented Dec 20, 2019

araffin commented Dec 20, 2019

jarlva commented Dec 21, 2019 •

edited

Loading

TRPO "underflow encountered in multiply" #59

TRPO "underflow encountered in multiply" #59

Comments

jarlva commented Dec 20, 2019

araffin commented Dec 20, 2019

jarlva commented Dec 21, 2019 • edited Loading

jarlva commented Dec 21, 2019 •

edited

Loading