Logging RL results and tracking them with ModelCheckpoint(monitor=...) #5883

TrentBrick · 2020-11-09T01:59:49Z

TrentBrick
Nov 9, 2020

I am using Pytorch Lightning in an RL setting and want to save a model when it hits a new max average reward. I am using the Tensorboard logger where I return my neural network loss in the training_step() using:

logs = {"policy_loss": pred_loss}
return {'loss':pred_loss, 'log':logs}

And then I am saving my RL environment rewards using in on_epoch_end():

self.logger.experiment.add_scalar("mean_reward", np.mean(reward_losses), self.global_step)
self.logger.experiment.add_scalars('rollout_stats', {"std_reward":np.std(reward_losses),
                "max_reward":np.max(reward_losses), "min_reward":np.min(reward_losses)}, self.global_step)

And every 5 epochs I am also writing out another RL reward loss where I use the best actions rather than sampling from them:

if self.current_epoch % self.hparams['eval_every']==0 and self.logger:
            output = self.collect_rollouts(greedy=True, num_episodes=self.hparams['eval_episodes'])
            reward_losses = output[0]
            self.logger.experiment.add_scalar("eval_mean", np.mean(reward_losses), self.global_step)

My question is, how can I set my ModelCheckpoint to monitor eval_mean (which is only written out every 5 epochs, this seems like it would be a problem)? I would also settle for monitoring mean_reward (written out every epoch)? Right now I can only successfully monitor policy_loss which does not always correspond to higher rewards obtained (setting monitor = to anything else throws an error).

I know that in the new PL version self.log() should be used but after re-writing my code using this it still didn't solve my issue.

I have spent a lot of time looking through the docs and for examples of this but I have found the logging docs on this to be quite sparse and difficult to even get everything to log in the first place.

I am using Pytorch Lightning 1.0.5 and Pytorch 1.7.0.

Thank you for any help/guidance.

Answered by awaelchli

Nov 9, 2020

I have multiple comments that I did not verify yet but they might help

If I'm not mistaken, self.log only works within a selection of hooks currently. I suggest you try to move the relevant code to training_epoch_end where self.log should work correctly.
set the monitor key in the ModelCheckpoint(monitor=) explicitly.
You have the problem that you can only update/log every n epochs: I see two solutions: 1) synchronize your ModelCheckpoint with the period parameter to only run on the epochs you update the monitor quantity. 2) Cache the last value and log it in the epochs between your regular interval, to make the ModelCheckpoint see it as unchanged. The second option may even be the defau…

View full answer

2020-11-09T02:00:30Z

github-actions[bot]
bot Nov 9, 2020

Hi! thanks for your contribution!, great first issue!

0 replies

awaelchli · 2020-11-09T16:45:12Z

awaelchli
Nov 9, 2020

I have multiple comments that I did not verify yet but they might help

If I'm not mistaken, self.log only works within a selection of hooks currently. I suggest you try to move the relevant code to training_epoch_end where self.log should work correctly.
set the monitor key in the ModelCheckpoint(monitor=) explicitly.
You have the problem that you can only update/log every n epochs: I see two solutions: 1) synchronize your ModelCheckpoint with the period parameter to only run on the epochs you update the monitor quantity. 2) Cache the last value and log it in the epochs between your regular interval, to make the ModelCheckpoint see it as unchanged. The second option may even be the default behavior by Lightning but need to verify.

So in summary, I imagine something like this:

# Model

def training_epoch_end(self, outputs):
    # ... compute reward losses
    
    if self.current_epoch % self.hparams['eval_every']==0:
        self.last_eval_mean = # compute the new eval mean

     self.log("eval_mean", self.last_eval_mean)


# Trainer
trainer = Trainer(callbacks=[ModelCheckpoint(monitor="eval_mean")]

# or maybe also try
trainer = Trainer(callbacks=[ModelCheckpoint(monitor="eval_mean", period=self.hparams['eval_every'])]

0 replies

TrentBrick · 2020-11-09T18:16:27Z

TrentBrick
Nov 9, 2020
Author

Thanks for this all of this. It sounds like the fundamental problem may be that with my code I was not logging from training_epoch_end()? Because I was setting monitor= to explicitly track either eval_mean or mean_reward but they wouldn't be detected.

I will try this and let you know if it works.

0 replies

NotNANtoN · 2020-11-22T12:11:40Z

NotNANtoN
Nov 22, 2020

I had a very similar issue: in my reinforcement learning framework I wanted to measure the validation performance of my agent. Of course I would do so without a validation_dataloader, hence I thought I could just set that dataloader to None and define a validation_step myself. Unfortunately, given a validation_dataloader that is None, validation_step and all validation methods are not called at all.
I tried to solve this via a callback to on_train_epoch_end or on_epoch_end. This worked, but in those callbacks the self.log() call does not work at all - most importantly, there is no feedback from pytorch_lightning that the call was unsuccesfull. Luckily enough, I found this thread here and moved my validation code into training_epoch_end, which works.

Maybe pytorch_lightning could at least give a warning once one tries to use self.log() in a place where it has no effect?

0 replies

awaelchli · 2020-11-23T03:13:54Z

awaelchli
Nov 23, 2020

Regarding the self.log() from callbacks, @tchaton was working on this in #3813 and it should now be working.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Logging RL results and tracking them with ModelCheckpoint(monitor=...) #5883

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 5 comments

{{title}}

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

Select a reply

Logging RL results and tracking them with ModelCheckpoint(monitor=...) #5883

TrentBrick Nov 9, 2020

Replies: 5 comments

github-actions[bot] bot Nov 9, 2020

awaelchli Nov 9, 2020

TrentBrick Nov 9, 2020 Author

NotNANtoN Nov 22, 2020

awaelchli Nov 23, 2020

TrentBrick
Nov 9, 2020

github-actions[bot]
bot Nov 9, 2020

awaelchli
Nov 9, 2020

TrentBrick
Nov 9, 2020
Author

NotNANtoN
Nov 22, 2020

awaelchli
Nov 23, 2020