You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I add two new modules to a large multimodal model. Now, I want to train module A with MSE loss and train module B with CrossEntropy. I've tried to set automatic_optimization to False and process backward manually, but the gradients are always None(I printed gradients with print(param.grad)). Besides,when I train in ddp, I got
RuntimeError: It looks like your LightningModule has parameters that were not used in producing the loss returned by training_step. If this is intentional, you must enable the detection of unused parameters in DDP, either by setting the string value `strategy='ddp_find_unused_parameters_true'` or by setting the flag in the strategy with `strategy=DDPStrategy(find_unused_parameters=True)`.
when I train in deepspeed_stage_2_offload, I got
UserWarning: Detected call of `lr_scheduler.step()` before `optimizer.step()`. In PyTorch 1.1.0 and later, you should call them in the opposite order: `optimizer.step()` before `lr_scheduler.step()`. Failure to do this will result in PyTorch skipping the first value of the learning rate schedule. See more details at https://pytorch.org/docs/stable/optim.html#how-to-adjust-learning-rate
This might indicate that the optimizer.step() is skipped due to empty gradients.
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
-
I add two new modules to a large multimodal model. Now, I want to train module A with MSE loss and train module B with CrossEntropy. I've tried to set
automatic_optimization
toFalse
and process backward manually, but the gradients are alwaysNone
(I printed gradients withprint(param.grad)
). Besides,when I train inddp
, I gotwhen I train in
deepspeed_stage_2_offload
, I gotThis might indicate that the
optimizer.step()
is skipped due to empty gradients.I configure optimizer with following code:
In my train step:
Please help me!!
Beta Was this translation helpful? Give feedback.
All reactions