Releases: JMGaljaard/fltk-testbed
v0.3.1
In short
This release updates the code base such that:
- Documentation is updated to show how docker images can be built and pushed.
- Memory issues are resolved/mitigated for FederatedLearning experiments
- BatchOrchestrator has been tested and updated according to the findings from debugging.
- Supports waiting for
HistoricalArrivalTask
s - Supports stop-and-go deployment through parallelism configuration parameters.
- Supports waiting for
What's Changed
- Demo by @JMGaljaard in #46
- 43 experiment replication batch by @JMGaljaard in #47
- Demo by @JMGaljaard in #48
- Develop by @JMGaljaard in #56
Full Changelog: v0.3.0...v0.3.1
Terraform and repetitions
In short
This release revamps the deployment on GKE with Terraform, making deployment a breeze. Furthermore, the dependency list is slimmed down from Kubeflow to only Kubeflow-training-operators. This alliviates the overhead on your cluster, as for example istio is now no-longer required for deployment.
For experiments, the orchestrator allows for running repetitions of experiments directly. This allows to describe an experiment file once (e.g. a distributed learning configuration), and run it multiple times in a single deployment.
What's Changed
- Terraform deployment by @JMGaljaard in #44
- 43 experiment replication by @JMGaljaard in #45
Full Changelog: v0.2.2...v0.3.0
Federated num_epochs_per_round
Important information
In the configuration files (e.g. configs/federated/*.yaml
) for Federated Learning, make sure to check that you use the following parameters as follows:
totalEpochs
, you MUSN'T use, this will be removed later, after this years' course has finished. This parameter was also not used before this release, but keep this in mind. Anwarning
will be logged during execution if you do access this.rounds
, the number of communication rounds, so how many times theFederator
node will sample and contactClient
nodes.epochsPerRound
, the number of epochs the `Clients perform within a communication round.
What's Changed
- Resolve federated learning rounds per epoch by @JMGaljaard in #41. Thanks @AbeleMM for spotting this error.
Full Changelog: v0.2.1...v0.2.2
Experiment parsing and testing
This release introduces the following change:
- All
federated
anddata_parallel
experiment parsers now load all their respective parameters. - The loss function can now be any of the functions that inherit from
torch.nn.modules.loss._Loss
base class, using their CamelCase name. - A series of test cases was added to test the parsing of default values with configured values for testing.
Issues closed
This version resolves the following issues. For more information read the changelog above.
Kubernetes FLTK
This release introduces the following changes:
- FederatedLearning datasets other than
FashionMNIST
failed to load without raising anException
. Thereby resulting in a failed execution of experiments. This required a revision of the experiment configuration objects that were used by the learners, see also point 4.- To help detect issues like these from getting introduced undetected, a series of smoke tests were introduced that can be run locally.
- Distributed (
DistributedDataParallel
) experiments are now compatible again with KFLTK 🎉. See also #26. - Configuration files with small floating-point numbers without a decimal (such as
10e-5
) will no longer be parsed as a string byFedLearningConfig
andDistLearningConfig
objects. This was mainly an issue for themin_lr
configuration parameter, which would result in an exception being raised after 50 epochs. - The (flat) configuration objects for
DistributedDataParallel
andFederated
learning experiments have been unified partially. These objects are now renamed toFedLearningConfig
andDistLearningConfig
respectively.⚠️ Make sure to update your imports.- Moreover both classes now allow being directly parsed from a
json
file. Note that theFromYamlFile
function has been renamed to the more pythonic namefrom_yaml
- These objects now both make use of the
Definitions
Enum
classes, allowing for more readable parsing errors when providing an incorrect type.
- Moreover both classes now allow being directly parsed from a
- A typo has been resolved in the jinja templates such that the
aggregation
is now properly set. - Datasets for Federated experiments now make use of the
test_batch_size
, rather than using a default of16
. This parameter has also been introduced in the jinja template. - the
fltk.util.env
module has been renamed tofltk.util.environment
to add it to the repository. This also resolves #30 - A series of linter issues have been resolved (mostly warnings and some typos).
Issues closed
This version resolves the following issues. For more information read the changelog above.