Experiment stuck when 100% training completed π‘β #760
-
I have successfully started an FL experiment using the interactive api and all seems to work perfectly but when the training of the collaborators is finished, the experiment gets stuck. Simply, it gets there and it does not perform any agreggation or additional operation. I also try to see the metrics via notebook but it does not throw any progress bar or metric in the output: The experiment is defined like this:
Am I missing a task for the aggregation function or something related? (I have the same configuration as tinyimage's example. Thanks in advance!! |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment
-
The problem was that I was putting the tqdm progress bar for the dataloader and not for the loop of epochs so it just took into account the first iteration and the other epochs were executing "in the background". As shown in the image, the bar of training is in 100%. However, it remained 7 epochs yet as just one epoch (one complete iteration of the dataloader) had been trained. |
Beta Was this translation helpful? Give feedback.
The problem was that I was putting the tqdm progress bar for the dataloader and not for the loop of epochs so it just took into account the first iteration and the other epochs were executing "in the background". As shown in the image, the bar of training is in 100%. However, it remained 7 epochs yet as just one epoch (one complete iteration of the dataloader) had been trained.