-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Log wandb step using wandb native step arg in addition to the "step" key. #613
Conversation
Making this is change now would make it difficult to compare new and old runs Is there any benefit? |
We can keep |
Hmm, shouldn't we also fix the logging line in open_clip/src/training/train.py Lines 341 to 344 in 91b7b51
To be consistent we probably should also use the number of training steps so far here, e.g. if args.wandb:
assert wandb is not None, 'Please install wandb.'
dataloader = data['train'].dataloader
num_batches_per_epoch = dataloader.num_batches // args.accum_freq
step = num_batches_per_epoch * epoch
for name, val in metrics.items():
wandb.log({f"val/{name}": val, 'epoch': epoch}, step=step) |
Upon closer look I think we can consolidate prefixing and # In train_one_epoch():
log_data = {"train/" + k: v for k, v in log_data.items()}
if tb_writer is not None:
for name, val in log_data.items():
tb_writer.add_scalar(name, val, step)
log_data['step'] = step
if args.wandb:
assert wandb is not None, 'Please install wandb.'
wandb.log(log_data, step=step) # In evaluate():
log_data = {"val/" + k: v for k, v in metrics.items()}
if args.save_logs:
if tb_writer is not None:
for name, val in log_data.items():
tb_writer.add_scalar(name, val, epoch)
with open(os.path.join(args.checkpoint_path, "results.jsonl"), "a+") as f:
f.write(json.dumps(metrics))
f.write("\n")
if args.wandb:
assert wandb is not None, 'Please install wandb.'
dataloader = data['train'].dataloader
num_batches_per_epoch = dataloader.num_batches // args.accum_freq
step = num_batches_per_epoch * epoch
log_data['epoch'] = epoch
wandb.log(log_data, step=step) Overall I strongly vote for fixing this. Current |
Thanks @EIFY. Just pushed some changes. Tested on a run and wandb is looking good. We still have a lowercase |
the current I think changing this will make it really confusing for comparing old runs where Step is meaningless with new runs when Step is meaningful if we do change it, can we at least add a big warning in some logging section in the readme? |
Even with the change, people can keep doing comparisons the exact same way they did before, by looking at lowercase I'm also fine with keeping things as they are. I think it's cleaner and less confusing to have a |
@rom1504 wandb ui defaults to its own |
@rom1504 Could we merge this? I think concerns have been addressed. |
Yes merged |
…key. (mlfoundations#613) * wandb step fix * backwards compat fix * update wandb calls * update readme
Currently we are passing
step
to wandb as another metric instead of using thestep
argument towandb.log
(see https://docs.wandb.ai/ref/python/log). This causes two "step" variables to be logged and can cause inconsistencies.