Experiment stuck when 100% training completed 📡 #760

daviddelriod · 2023-03-19T19:19:18Z

daviddelriod
Mar 19, 2023

I have successfully started an FL experiment using the interactive api and all seems to work perfectly but when the training of the collaborators is finished, the experiment gets stuck. Simply, it gets there and it does not perform any agreggation or additional operation.

I also try to see the metrics via notebook but it does not throw any progress bar or metric in the output:

The experiment is defined like this:

fl_experiment.start( model_provider=model_interface, task_keeper=task_interface, data_loader=fed_dataset, rounds_to_train=5, opt_treatment='CONTINUE_GLOBAL' )

Am I missing a task for the aggregation function or something related? (I have the same configuration as tinyimage's example.

Thanks in advance!!

Answered by daviddelriod

Mar 22, 2023

The problem was that I was putting the tqdm progress bar for the dataloader and not for the loop of epochs so it just took into account the first iteration and the other epochs were executing "in the background". As shown in the image, the bar of training is in 100%. However, it remained 7 epochs yet as just one epoch (one complete iteration of the dataloader) had been trained.

View full answer

daviddelriod · 2023-03-22T17:22:50Z

daviddelriod
Mar 22, 2023
Author

The problem was that I was putting the tqdm progress bar for the dataloader and not for the loop of epochs so it just took into account the first iteration and the other epochs were executing "in the background". As shown in the image, the bar of training is in 100%. However, it remained 7 epochs yet as just one epoch (one complete iteration of the dataloader) had been trained.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Experiment stuck when 100% training completed 📡 #760

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 1 comment

{{title}}

Select a reply

Experiment stuck when 100% training completed 📡 #760

daviddelriod Mar 19, 2023

Replies: 1 comment

daviddelriod Mar 22, 2023 Author

daviddelriod
Mar 19, 2023

daviddelriod
Mar 22, 2023
Author