We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
你好。我在trainer中设置了如下参数( trainer = Trainer( driver="torch", train_dataloader=dl["train"], evaluate_dataloaders=dl["dev"], device=[4,7], callbacks=callback, optimizers=optimizer, n_epochs=args.epoch, accumulation_steps=args.accumulation_steps, torch_kwargs = {'ddp_kwargs':{'find_unused_parameters':True}} ) trainer.run())确实是在两张卡上运行了起来 但是训练过程打印的loss:NAN,并且每个epoch打印的每个指标都是一个相同的值,请问问题出在哪里
The text was updated successfully, but these errors were encountered:
No branches or pull requests
你好。我在trainer中设置了如下参数(
trainer = Trainer(
driver="torch",
train_dataloader=dl["train"],
evaluate_dataloaders=dl["dev"],
device=[4,7],
callbacks=callback,
optimizers=optimizer,
n_epochs=args.epoch,
accumulation_steps=args.accumulation_steps,
torch_kwargs = {'ddp_kwargs':{'find_unused_parameters':True}}
)
trainer.run())确实是在两张卡上运行了起来 但是训练过程打印的loss:NAN,并且每个epoch打印的每个指标都是一个相同的值,请问问题出在哪里
The text was updated successfully, but these errors were encountered: