Using Manual Optimisation #5780
Replies: 15 comments 21 replies
-
Also, if the Trainer's option has been deprecated in favour of a LightningModule's property #5011 It's also not clear if there is something else that we should worry about when doing manual optimization in a distributed setting |
Beta Was this translation helpful? Give feedback.
-
I have exactly the same issue.
|
Beta Was this translation helpful? Give feedback.
-
+1 on this issue. Documentation is very unclear about how to work with this going forward. |
Beta Was this translation helpful? Give feedback.
-
I was confused by this but managed to figure this nuance out.
There's probably an easier way (I'd hope you can set it with a one-liner), but I haven't been able to figure it out quite yet. |
Beta Was this translation helpful? Give feedback.
-
That though still doesn't explain what is and isn't managed by Lightning itself once you've turned automatic optimisation off :( |
Beta Was this translation helpful? Give feedback.
-
Also, I have been getting |
Beta Was this translation helpful? Give feedback.
-
Yeah we still need to pass it in (although it's largely useless for manual optimisation - from what I'm reading) |
Beta Was this translation helpful? Give feedback.
-
I am also confused about how to handle scheduler manually. From what I've seen, the scheduler is not working properly when using checkpoint. |
Beta Was this translation helpful? Give feedback.
-
Hey everyone, Possible to use it as follow def training_step(...):
(opt_a, opt_b) = self.optimizers()
loss_a = ...
# automatically applies scaling, etc...
self.manual_backward(loss_a, opt_a)
opt_a.step() Best, |
Beta Was this translation helpful? Give feedback.
-
class TestModel(BoringModel):
def training_step(self, batch, batch_idx, optimizer_idx):
# emulate gans training
opt_gen, opt_dis = self.optimizers()
# Note: Be careful, don't log on the same key in self.log in both closure
# as they will be aggregated together on epoch_end
def compute_loss():
x = batch[0]
x = F.dropout(x, 0.1)
predictions = self(x)
predictions = F.dropout(predictions, 0.1)
loss = self.loss(None, predictions)
return loss
def gen_closure():
loss_gen = compute_loss()
self.log("loss_gen", loss_gen, on_step=True, on_epoch=True)
self.manual_backward(loss_gen, opt_gen)
def dis_closure():
loss_dis = compute_loss()
self.log("loss_dis", loss_dis, on_step=True, on_epoch=True)
self.manual_backward(loss_dis, opt_dis)
# this will accumulate gradients for 2 batches and then call opt_gen.step()
opt_gen.step(closure=gen_closure, make_optimizer_step=(batch_idx % 2 == 0), optim='sgd')
# update discriminator every 4 baches
# therefore, no gradient accumulation for discriminator
if batch_idx % 4 == 0 :
# Note: Set make_optimizer_step to True or it will use by default
# Trainer(accumulate_grad_batches=x)
opt_dis.step(closure=dis_closure, make_optimizer_step=True)
def training_epoch_end(self, outputs) -> None:
# outputs should be an array with an entry per optimizer
assert len(outputs) == 2
def configure_optimizers(self):
optimizer_gen = torch.optim.SGD(self.layer.parameters(), lr=0.1)
optimizer_dis = torch.optim.Adam(self.layer.parameters(), lr=0.001)
return [optimizer_gen, optimizer_dis]
@property
def automatic_optimization(self) -> bool:
return False
model = TestModel()
model.val_dataloader = None
model.training_epoch_end = None
limit_train_batches = 8
trainer = Trainer(
default_root_dir=tmpdir,
limit_train_batches=limit_train_batches,
limit_val_batches=2,
max_epochs=1,
log_every_n_steps=1,
accumulate_grad_batches=2,
enable_pl_optimizer=True,
)
trainer.fit(model) |
Beta Was this translation helpful? Give feedback.
-
Hi @tchaton ! I have some questions regarding to this manual mode.
|
Beta Was this translation helpful? Give feedback.
-
Thanks for the thread, it greatly helped me setting up correctly a manual optimisation procedure. I'd like to point out that the property override class SomeModule(LightningModule):
@property
def automatic_optimization(self) -> bool:
return False was not sufficient for me, I was required to also set |
Beta Was this translation helpful? Give feedback.
-
So the correct way to set the @property
def automatic_optimization(self) -> bool:
return False This is VERY unclearly documented I think... The only info there is "Please use the property on the LightningModule for disabling automatic optimization" in the deprecation warning you get when you use the |
Beta Was this translation helpful? Give feedback.
-
I'm working on improving the optimization docs in #6907. It would be wonderful if you could have a look and leave some feedback there. Please let me know if there's still something unclear. |
Beta Was this translation helpful? Give feedback.
-
From 1.3, Lightning supports learning rate schedulers in manual optimization for you to call class Model(LightningModule):
def __init__(self, ...):
self.automatic_optimization = False
...
def trainin_step(self, ...):
# forward, backward, optimization
...
# get your LR schedulers
lr_scheduler = self.lr_schedulers() # single scheduler
lr_scheduler1, lr_scheduler2, ... = self.lr_schedulers() # multiple schedulers
# call `.step()`
lr_scheduler.step() The recently updated doc has an example for calling |
Beta Was this translation helpful? Give feedback.
-
❓ Questions and Help
What is your question?
The documentation for manual optimisation is vague and doesn't provide any complete examples of how to use it correctly. Hence, I need to know what Lightning takes care off, and what I have to do within the training loop.
To be more specific, do I need to zero gradients and call step on the learning rate scheduler myself?
Given that this is "manual" mode I'd be fine having to do it (and half expect I will), but what's extremely confusing is that the given examples seem to switch between stating/showing gradients being manually zeroed and not being touched at all...
Take the current optimizer section example, it does not show anything being zeroed. On the other hand, the documentation for trainer's manual optimisation shows the gradients being explicitly zeroed.
So which is it?
Furthermore, how do I access/step for the learning rate scheduler (or is that not something for me to handle here)?
What have you tried?
For a little context, what I'm trying to do is to port regular PyTorch GAN code into Lightning.
The module dynamically selects whether to train the generator or discriminator at the start of each batch depending on the discriminator's loss.
So if the loss is below some threshold it'll backpropagate & optimise for the discriminator, otherwise generator.
I previously in <=1.0.8 used automatic optimisation with a custom optimizer step function, however, that in all honesty was quite clunky and no longer works with
accumulate_grad_batches
(which we need as we're working with extremely large 3d data).Instead of this today I've written code to check whether this batch is for training discriminator or generator and based on that run
self.manual_backward(loss, optimizer); optimizer.step()
.I'm pleased that it runs, but can't seem to see any documentation which actually specifies whether this is enough to use the scheduler and accumulated gradient batches.
What's your environment?
Thanks so much for any help!
Beta Was this translation helpful? Give feedback.
All reactions