Using Manual Optimisation #5780

KamWithK · 2020-12-13T06:19:53Z

KamWithK
Dec 13, 2020

❓ Questions and Help

What is your question?

The documentation for manual optimisation is vague and doesn't provide any complete examples of how to use it correctly. Hence, I need to know what Lightning takes care off, and what I have to do within the training loop.
To be more specific, do I need to zero gradients and call step on the learning rate scheduler myself?

Given that this is "manual" mode I'd be fine having to do it (and half expect I will), but what's extremely confusing is that the given examples seem to switch between stating/showing gradients being manually zeroed and not being touched at all...
Take the current optimizer section example, it does not show anything being zeroed. On the other hand, the documentation for trainer's manual optimisation shows the gradients being explicitly zeroed.
So which is it?

Furthermore, how do I access/step for the learning rate scheduler (or is that not something for me to handle here)?

What have you tried?

For a little context, what I'm trying to do is to port regular PyTorch GAN code into Lightning.
The module dynamically selects whether to train the generator or discriminator at the start of each batch depending on the discriminator's loss.
So if the loss is below some threshold it'll backpropagate & optimise for the discriminator, otherwise generator.

I previously in <=1.0.8 used automatic optimisation with a custom optimizer step function, however, that in all honesty was quite clunky and no longer works with accumulate_grad_batches (which we need as we're working with extremely large 3d data).
Instead of this today I've written code to check whether this batch is for training discriminator or generator and based on that run self.manual_backward(loss, optimizer); optimizer.step().
I'm pleased that it runs, but can't seem to see any documentation which actually specifies whether this is enough to use the scheduler and accumulated gradient batches.

What's your environment?

OS: Windows Subsystem for Linux (Ubuntu)
Packaging: pip installed into conda environment
Version: 1.1.0

Thanks so much for any help!

CarloLucibello · 2020-12-14T13:09:19Z

CarloLucibello
Dec 14, 2020

Also, if the Trainer's option has been deprecated in favour of a LightningModule's property #5011
https://pytorch-lightning.readthedocs.io/en/stable/api/pytorch_lightning.core.lightning.html#pytorch_lightning.core.lightning.LightningModule.automatic_optimization
the documentation should be made consistent.

It's also not clear if there is something else that we should worry about when doing manual optimization in a distributed setting

0 replies

AliKarimi74 · 2020-12-16T09:19:58Z

AliKarimi74
Dec 16, 2020

I have exactly the same issue.

automatic_optimization is deprecated in v1.1.0, but there is not an alternative in documents. There is an automatic_optimization property in LightningModule, but it's not obvious how we can change it. Also, there are inconsistencies in the documentation about manually calling zero_grad. Nowhere also mentioned that schedulers step is handling by lightning or not.

0 replies

blakedewey · 2020-12-16T20:24:03Z

blakedewey
Dec 16, 2020

+1 on this issue. Documentation is very unclear about how to work with this going forward.

0 replies

KamWithK · 2020-12-16T22:34:43Z

KamWithK
Dec 16, 2020
Author

automatic_optimization is deprecated in v1.1.0, but there is not an alternative in documents. There is an automatic_optimization property in LightningModule, but it's not obvious how we can change it. Also, there are inconsistencies in the documentation about manually calling zero_grad. Nowhere also mentioned that schedulers step is handling by lightning or not.

I was confused by this but managed to figure this nuance out.
You have to define it as a property, so something like:

class SomeModule(LightningModule):
    @property
    def automatic_optimization(self) -> bool:
        return False

There's probably an easier way (I'd hope you can set it with a one-liner), but I haven't been able to figure it out quite yet.

1 reply

leeh2213 Jul 22, 2024

I have confronted the same issue before when i trained the model VQ-GAN, then i found the corresponding answer in document of lightning package, here is the link that u can find the answer. You should add "self.automatic_optimization = False" in "init" function.

KamWithK · 2020-12-16T22:35:44Z

KamWithK
Dec 16, 2020
Author

That though still doesn't explain what is and isn't managed by Lightning itself once you've turned automatic optimisation off :(

0 replies

blakedewey · 2020-12-17T16:29:39Z

blakedewey
Dec 17, 2020

Also, I have been getting ValueError: Your LightningModule defines 2 optimizers but training_step is missing the "optimizer_idx" argument. when using manual optimization via the LightningModule parameter.

0 replies

KamWithK · 2020-12-18T01:00:13Z

KamWithK
Dec 18, 2020
Author

Also, I have been getting ValueError: Your LightningModule defines 2 optimizers but training_step is missing the "optimizer_idx" argument. when using manual optimization via the LightningModule parameter.

Yeah we still need to pass it in (although it's largely useless for manual optimisation - from what I'm reading)

0 replies

heng-yuwen · 2020-12-20T12:42:14Z

heng-yuwen
Dec 20, 2020

I am also confused about how to handle scheduler manually. From what I've seen, the scheduler is not working properly when using checkpoint.

0 replies

tchaton · 2020-12-21T17:15:11Z

tchaton
Dec 21, 2020
Maintainer

Hey everyone,

Possible to use it as follow

            def training_step(...):
                (opt_a, opt_b) = self.optimizers()
                loss_a = ...
                # automatically applies scaling, etc...
                self.manual_backward(loss_a, opt_a)
                opt_a.step()

Best,
T.C

2 replies

surya-narayanan May 7, 2021

Hi, do we have to pass in the optimizer as an argument for manual_backward?

carmocca May 7, 2021

Not anymore, passing the optimizer was deprecated in v1.2 and will be removed in v1.4

tchaton · 2020-12-21T17:17:31Z

tchaton
Dec 21, 2020
Maintainer

    class TestModel(BoringModel):
        def training_step(self, batch, batch_idx, optimizer_idx):

            # emulate gans training
            opt_gen, opt_dis = self.optimizers()

            # Note: Be careful, don't log on the same key in self.log in both closure
            # as they will be aggregated together on epoch_end

            def compute_loss():
                x = batch[0]
                x = F.dropout(x, 0.1)
                predictions = self(x)
                predictions = F.dropout(predictions, 0.1)
                loss = self.loss(None, predictions)
                return loss

            def gen_closure():
                loss_gen = compute_loss()
                self.log("loss_gen", loss_gen, on_step=True, on_epoch=True)
                self.manual_backward(loss_gen, opt_gen)

            def dis_closure():
                loss_dis = compute_loss()
                self.log("loss_dis", loss_dis, on_step=True, on_epoch=True)
                self.manual_backward(loss_dis, opt_dis)

            # this will accumulate gradients for 2 batches and then call opt_gen.step()
            opt_gen.step(closure=gen_closure, make_optimizer_step=(batch_idx % 2 == 0), optim='sgd')

            # update discriminator every 4 baches
            # therefore, no gradient accumulation for discriminator
            if batch_idx % 4 == 0 :
                # Note: Set make_optimizer_step to True or it will use by default
                # Trainer(accumulate_grad_batches=x)
                opt_dis.step(closure=dis_closure, make_optimizer_step=True)

        def training_epoch_end(self, outputs) -> None:
            # outputs should be an array with an entry per optimizer
            assert len(outputs) == 2

        def configure_optimizers(self):
            optimizer_gen = torch.optim.SGD(self.layer.parameters(), lr=0.1)
            optimizer_dis = torch.optim.Adam(self.layer.parameters(), lr=0.001)
            return [optimizer_gen, optimizer_dis]

        @property
        def automatic_optimization(self) -> bool:
            return False

    model = TestModel()
    model.val_dataloader = None
    model.training_epoch_end = None

    limit_train_batches = 8
    trainer = Trainer(
        default_root_dir=tmpdir,
        limit_train_batches=limit_train_batches,
        limit_val_batches=2,
        max_epochs=1,
        log_every_n_steps=1,
        accumulate_grad_batches=2,
        enable_pl_optimizer=True,
    )

    trainer.fit(model)

1 reply

nihirv Jul 12, 2021

Do you need to specify accumulate_grad_batches in Trainer if you're manually calling opt.step with make_optimizer_step at the relevant batch_idx?

showgood163 · 2020-12-22T14:05:56Z

showgood163
Dec 22, 2020

Hi @tchaton ! I have some questions regarding to this manual mode.

opt_dis.step(closure=dis_closure, make_optimizer_step=True), is this step funciton the one in pytorch? Does the closure here support a function with input parameters?
What should I do if I have different losses for the same optimizer that needs to be optimized in one batch but with seperate steps? For example, a normal discriminator loss that need to be updated every step, and a discriminator regularization loss that need to be updated every 16 steps and after the step of that normal loss.
Will there be an detailed elaboration of this manual mode?

0 replies

pierresegonne · 2021-01-27T12:27:22Z

pierresegonne
Jan 27, 2021

Thanks for the thread, it greatly helped me setting up correctly a manual optimisation procedure.

I'd like to point out that the property override

class SomeModule(LightningModule):
    @property
    def automatic_optimization(self) -> bool:
        return False

was not sufficient for me, I was required to also set automatic_optimization=False when instantiating my trainer with Trainer.from_argparse_args(..)

0 replies

rubencart · 2021-02-16T16:28:23Z

rubencart
Feb 16, 2021

So the correct way to set the automatic_optimization attribute to False is by overriding the automatic_optimization(...) method? Like

@property
def automatic_optimization(self) -> bool:
    return False

This is VERY unclearly documented I think... The only info there is "Please use the property on the LightningModule for disabling automatic optimization" in the deprecation warning you get when you use the Trainer argument?

8 replies

tchaton Feb 26, 2021
Maintainer

Dear All,

As noted before, automatic_optimization is now a property of the model.

For example:

    class TestModel(BoringModel):

        def training_step(self, batch, *_, **kwargs):
            opt = self.optimizers()
            output = self.layer(batch)
            loss = self.loss(batch, output)
            opt.zero_grad()
            self.manual_backward(loss)
            opt.step()

        @property
        def automatic_optimization(self):
            return False

    model = TestModel()
    model.training_epoch_end = None
    trainer = Trainer(
        default_root_dir=tmpdir,
        fast_dev_run=True,
    )

The model property decide of the behaviour as: https://github.com/PyTorchLightning/pytorch-lightning/blob/master/pytorch_lightning/trainer/connectors/model_connector.py#L30

 automatic_optimization = ref_model.automatic_optimization and self.trainer.train_loop.automatic_optimization

Before 1.2 release, manual optimization was taking care of zero_grad, accumulated_grad_batches for the users.
However, we recently found a bug, and manual optimization actually didn't work for multiple optimizers as we didn't toggle the model based on the optimizer.param_groups.

From 1.2, we decided to take a different approach to avoid this kind of bug.
Everything is left to the users. Lightning will just handle accelerator and precision.

I hope it helps !

Dear @pierresegonne, would you mind contacting me on Lightning Slack, so we can work together on improving the doc.

Best,
T.C

rubencart Feb 26, 2021

Is it correct that you can also set self.automatic_optimization = False in the LightningModule's __init__, to use manual optimization (as shown here)? Or do you really have to override the @property?

@tchaton wait, so you have to call zero_grad() yourself?! When I check the 1.2.1 PL code, I see that LightningOptimizer.step calls LightningOptimizer.__optimizer_step which calls model.optimizer_zero_grad which calls optimizer.zero_grad(). To be clear, the optimizers come from self.optimizers(use_pl_optimizer=True).

This is all quite confusing...

EDIT: sorry I was looking at an older version of the code, I see now that __optimizer_step only calls model.optimizer_zero_grad when automatic_optimization is True.
EDIT 2: aaand that doing self.automatic_optimization = False raises an error in 1.2.1 :-).

akihironitta Mar 6, 2021

@rubencart Hi, sorry for your inconvenience.

Is it correct that you can also set self.automatic_optimization = False in the LightningModule's __init__, to use manual optimization (as shown here)? Or do you really have to override the @property?

Actually, the correct way to turn off automatic optimization is to set self.automatic_optimization=False in LightningModule's __init__, which we recently updated its doc in #6294.

EDIT: sorry I was looking at an older version of the code, I see now that __optimizer_step only calls model.optimizer_zero_grad when automatic_optimization is True.

Yes, as @tchaton mentioned, users need to manually call zero_grad, backward, and step in manual optimization from 1.2.

EDIT 2: aaand that doing self.automatic_optimization = False raises an error in 1.2.1 :-).

Could you open an issue about it? It would be very helpful if you could provide us with a reproducible code on Google Colab.

KamWithK Mar 7, 2021
Author

What about learning rate schedulers? How do we use them with manual optimisation?

akihironitta Apr 5, 2021

@KamWithK Sorry for the delay. Currently (1.2.6>=), we do not support manual step for learning rate schedulers, but we're working on it so that users can call lr_scheduler.step() at arbitrary intervals in manual optimization. The tracking issue is #6379.

akihironitta · 2021-04-11T00:24:34Z

akihironitta
Apr 11, 2021

I'm working on improving the optimization docs in #6907. It would be wonderful if you could have a look and leave some feedback there. Please let me know if there's still something unclear.

5 replies

rubencart Apr 15, 2021

With all due respect, but I don't understand why you don't make it a rule that any PR that changes code must immediately update docs accordingly? For me personally, docs being out of date with code all the time is the number 1 current drawback of your otherwise great library...
That said, thank you for working on updating the docs :-). It looks already better.
I still wonder, here it says "In manual mode we still automatically accumulate grad over batches if Trainer(accumulate_grad_batches=x) is set and you use optimizer.step()". How does that interact with doing grad accumulation manually like in your updated docs here?
Also, is it possible to accumulate the logs as well for the amount of steps you accumulate gradients for?

carmocca Apr 15, 2021

With all due respect, but I don't understand why you don't make it a rule that any PR that changes code must immediately update docs accordingly? For me personally, docs being out of date with code all the time is the number 1 current drawback of your otherwise great library...

Hey @rubencart, thank you for your support. We really try our best. I don't mean to give excuses, it's just debt we have in exchange for trying to move so fast.

In any case, feel free to complain about any problems / issues / inconsistencies you find when reading our docs. We value your opinions.

I still wonder, here it says "In manual mode we still automatically accumulate grad over batches if Trainer(accumulate_grad_batches=x) is set and you use optimizer.step()". How does that interact with doing grad accumulation manually like in your updated docs here?

I believe that tip is outdated and should be removed. cc: @akihironitta

is it possible to accumulate the logs as well for the amount of steps you accumulate gradients for?

How do you want to accumulate the logs?

rubencart Apr 16, 2021

Of course :-) I can imagine it can be difficult.
I was thinking of a flag for self.log like the commit one in wandb.log (link)? So you could more closely simulate a run with a bigger batch size by accumulating gradients? But on the other hand I guess that if you just plot metrics/losses versus sample nb and not batch nb it shouldn't make a difference.

rubencart Apr 16, 2021

I'm also getting AttributeError: 'Trainer' object has no attribute 'is_last_batch'

akihironitta Apr 16, 2021

@rubencart Thank you for having a look at the docs!

I still wonder, here it says "In manual mode we still automatically accumulate grad over batches if Trainer(accumulate_grad_batches=x) is set and you use optimizer.step()".

As Carlos mentioned, the statement you pointed out in the docs is outdated. ~~I'll update it~~ It's been updated in #6907.

I'm also getting AttributeError: 'Trainer' object has no attribute 'is_last_batch'

The attribute self.trainer.is_last_batch is still unavailable as #6825 hasn't been merged yet. However, I assume it'll be available in master soon.

akihironitta · 2021-04-24T07:40:21Z

akihironitta
Apr 24, 2021

From 1.3, Lightning supports learning rate schedulers in manual optimization for you to call lr_scheduler.step() at arbitrary intervals. Basically, you can achieve it by writing something like the following in your LightningModule:

class Model(LightningModule):
    def __init__(self, ...):
        self.automatic_optimization = False
        ...

    def trainin_step(self, ...):
        # forward, backward, optimization
        ...

        # get your LR schedulers
        lr_scheduler = self.lr_schedulers()  # single scheduler
        lr_scheduler1, lr_scheduler2, ... = self.lr_schedulers()  # multiple schedulers

        # call `.step()`
        lr_scheduler.step()

The recently updated doc has an example for calling step() at every n epochs/batches and has some other examples of manual optimization, so please have a look: https://pytorch-lightning.readthedocs.io/en/latest/common/optimizers.html

4 replies

sanxing-chen May 23, 2021

Is there a full example for using FP16, gradient clipping, and gradient accumulation with manual optimization? I used the code below, but it cannot produce the same results as for automatic optimization.

    def on_after_backward(self):
        torch.nn.utils.clip_grad_norm_(self.parameters(), 1.0)
        return

    def training_step(self, batch, batch_idx):
        opt = self.optimizers()
        loss = self.compute_loss(batch)
        self.manual_backward(loss / self.update_freq)
        if (batch_idx + 1) % self.update_freq == 0:
            opt.step()
            self.lr_schedulers().step()
            opt.zero_grad()

This code gives different results for precision=16 and precision=32. And they're all different than the results of automatic optimization.

akihironitta Jul 1, 2021

@sanxing-chen Hi, ~~could you provide a full script and your environment info?~~ Maybe better to create a new discussion/issue to discuss this.

Hannibal046 Nov 8, 2022

Is there a full example for using FP16, gradient clipping, and gradient accumulation with manual optimization? I used the code below, but it cannot produce the same results as for automatic optimization.
    def on_after_backward(self):
        torch.nn.utils.clip_grad_norm_(self.parameters(), 1.0)
        return

    def training_step(self, batch, batch_idx):
        opt = self.optimizers()
        loss = self.compute_loss(batch)
        self.manual_backward(loss / self.update_freq)
        if (batch_idx + 1) % self.update_freq == 0:
            opt.step()
            self.lr_schedulers().step()
            opt.zero_grad()
This code gives different results for precision=16 and precision=32. And they're all different than the results of automatic optimization.

Any progress on this ? A full example for manual optimization with FP16, gradient clipping and gradient accumulation is needed ! The manual optimization for gradient accumulation in doc is incomplete. It should be self.manual_backward(loss / self.update_freq) rather than self.manual_backward(loss)

akihironitta Nov 9, 2022

@Hannibal046 Could you create a new discussion for your request so that it is more visible to the community?

Using Manual Optimisation #5780

❓ Questions and Help

What is your question?

What have you tried?

What's your environment?

Replies: 15 comments · 21 replies

KamWithK Dec 16, 2020 Author

KamWithK Dec 16, 2020 Author

KamWithK Dec 18, 2020 Author

tchaton Dec 21, 2020 Maintainer

tchaton Dec 21, 2020 Maintainer

tchaton Feb 26, 2021 Maintainer

KamWithK Mar 7, 2021 Author

Replies: 15 comments 21 replies

KamWithK
Dec 16, 2020
Author

KamWithK
Dec 16, 2020
Author

KamWithK
Dec 18, 2020
Author

tchaton
Dec 21, 2020
Maintainer

tchaton
Dec 21, 2020
Maintainer

tchaton Feb 26, 2021
Maintainer

KamWithK Mar 7, 2021
Author