-
Notifications
You must be signed in to change notification settings - Fork 62
Running experiments
In short, we provide 2 implementations of the Orchestrator;
-
BatchOrchestrator
: given a list of experiments, they will be executed one-after-the other, or all deployed at once. It utilizes aSequentialArrivalGenerator
to generate all experiments at once. -
SimulatedOrchestrator
: given a list of experiment configs, experiments arrive according to Poisson arrival statistics. It utilizes aSimulatedArrivalGenerator
to generate experiments according to a sampled number ofticks
, i.e., seconds by default.
These implementations are relatively bare-bones, allowing for extension and modification for different types of experiments. This section focuses on the BatchOrchestrator
, but naturally, this extends to working with SimulatedOrchestrators
.
A use case for FLTK is to run multiple sets of reproducible Distributed or Federated learning experiments. Here follows a short description that utilizes jinja-cli
to render templates with ease. Alternatively, jinja2
or the dataclass_json
's to_json
method can be used to generate large sets of experiments, e.g., using a library for (fractional) factorial experiments.
N.B. We recommend utilizing configuration files as they allow for retrieving/re-generating experiments.
Here we use a templating language, jinja
, which allows to programmatically fill templates. This allows us to treat the configuration as a template, for which we only need to specify the parameters we want to change.
We will utilize the following example to see the impact of training with different levels of parallelism on CIFAR100 with Resnet18.
For ease of use, we add the parameters in the same file. Alternatively, a parameter.yaml
file can be created, containing any parameters you want to use.
{%- set cores = "2000m" -%}
{%- set memory = "2Gi" -%}
{%- set train_batch = 64 -%}
{%- set test_batch = 64 -%}
{%- set parallel_list = [25, 50, 75] -%}
{%- set seeds = [42, 360, 20] -%}
{
"trainTasks": [
{%- for parallelism in parallel_list -%}
{%- for seed in seeds -%}
{
"type": "distributed",
"jobClassParameters": [
{
"networkConfiguration": {
"network": "Cifar100ResNet",
"lossFunction": "CrossEntropyLoss",
"dataset": "cifar100"
},
"systemParameters": {
"dataParallelism": {{ parallelism }},
"configurations": {
"Master": {
"cores": "{{ (cores | safe) }}",
"memory": "{{ (memory | safe) }}"
},
"Worker": {
"cores": "{{ (cores | safe) }}",
"memory": "{{ (memory | safe) }}"
}
}
},
"hyperParameters": {
"default": {
"batchSize": {{ train_batch }},
"testBatchSize": {{ test_batch }},
"learningRateDecay": 0.1,
"optimizerConfig": {
"type": "SGD",
"learningRate": 0.01,
"momentum": 0.9
},
"schedulerConfig": {
"schedulerStepSize": 50,
"schedulerGamma": 0.5,
"minimumLearningRate": 1e-10
}
},
"configurations": {
"Master": null,
"Worker": null
}
},
"learningParameters": {
"totalEpochs": 100,
"rounds": 100,
"epochsPerRound": 1,
"cuda": false,
"clientsPerRound": {{ parallelism }},
"dataSampler": {
"type": "uniform",
"qValue": 0.07,
"seed": {{ seed }},
"shuffle": true
},
"aggregation": "FedAvg"
}
}
]
}{% if not loop.last %},{% endif %}
{%- endfor -%}
{%- endfor -%}
]
}
Then running
jinja -d experiment.json.jinja # With parameters defined within jinja template
jinja -d experiment.json.jinja -f values.yaml # With parameters defined in separate file `values.yaml`
Will populate the template, which can then be used in deploying experiments.
For more complex configurations, e.g. with fractional factorial experiments. Templating can be done programmatically, in a similar fashion as templates are created for Pods in FLTK. This can be seen here. Combining this with a package like pyDOE
, allows for the generation and running of large sets of experiments with little time.