Skip to content

Running experiments

JMGaljaard edited this page Oct 6, 2022 · 1 revision

In short, we provide 2 implementations of the Orchestrator;

  1. BatchOrchestrator: given a list of experiments, they will be executed one-after-the other, or all deployed at once. It utilizes a SequentialArrivalGenerator to generate all experiments at once.
  2. SimulatedOrchestrator: given a list of experiment configs, experiments arrive according to Poisson arrival statistics. It utilizes a SimulatedArrivalGenerator to generate experiments according to a sampled number of ticks, i.e., seconds by default.

These implementations are relatively bare-bones, allowing for extension and modification for different types of experiments. This section focuses on the BatchOrchestrator, but naturally, this extends to working with SimulatedOrchestrators.

Running batches of experiments

A use case for FLTK is to run multiple sets of reproducible Distributed or Federated learning experiments. Here follows a short description that utilizes jinja-cli to render templates with ease. Alternatively, jinja2 or the dataclass_json's to_json method can be used to generate large sets of experiments, e.g., using a library for (fractional) factorial experiments.

N.B. We recommend utilizing configuration files as they allow for retrieving/re-generating experiments.

Leveraging templating

Here we use a templating language, jinja, which allows to programmatically fill templates. This allows us to treat the configuration as a template, for which we only need to specify the parameters we want to change.

We will utilize the following example to see the impact of training with different levels of parallelism on CIFAR100 with Resnet18. For ease of use, we add the parameters in the same file. Alternatively, a parameter.yaml file can be created, containing any parameters you want to use.

{%- set cores = "2000m" -%}
{%- set memory = "2Gi" -%}
{%- set train_batch = 64 -%}
{%- set test_batch = 64 -%}
{%- set parallel_list = [25, 50, 75]  -%}
{%- set seeds = [42, 360, 20] -%}
{
  "trainTasks": [
{%- for parallelism in parallel_list -%}
{%- for seed in seeds -%}
  {
    "type": "distributed",
    "jobClassParameters": [
      {
        "networkConfiguration": {
          "network": "Cifar100ResNet",
          "lossFunction": "CrossEntropyLoss",
          "dataset": "cifar100"
        },
        "systemParameters": {
          "dataParallelism": {{ parallelism }},
          "configurations": {
            "Master": {
              "cores": "{{ (cores | safe) }}",
              "memory": "{{ (memory | safe) }}"
            },
            "Worker": {
              "cores": "{{ (cores | safe) }}",
              "memory": "{{ (memory | safe) }}"
            }
          }
        },
        "hyperParameters": {
          "default": {
            "batchSize": {{ train_batch }},
            "testBatchSize": {{ test_batch }},
            "learningRateDecay": 0.1,
            "optimizerConfig": {
              "type": "SGD",
              "learningRate": 0.01,
              "momentum": 0.9
            },
            "schedulerConfig": {
              "schedulerStepSize": 50,
              "schedulerGamma": 0.5,
              "minimumLearningRate": 1e-10
            }
          },
          "configurations": {
            "Master": null,
            "Worker": null
          }
        },
        "learningParameters": {
          "totalEpochs": 100,
          "rounds": 100,
          "epochsPerRound": 1,
          "cuda": false,
          "clientsPerRound": {{ parallelism }},
          "dataSampler": {
            "type": "uniform",
            "qValue": 0.07,
            "seed": {{ seed }},
            "shuffle": true
          },
          "aggregation": "FedAvg"
        }
      }
    ]
  }{% if not loop.last %},{% endif %}
{%- endfor -%}
{%- endfor -%}
  ]
}

Then running

jinja -d experiment.json.jinja                     # With parameters defined within jinja template
jinja -d experiment.json.jinja -f values.yaml      # With parameters defined in separate file  `values.yaml`

Will populate the template, which can then be used in deploying experiments.

Programmatically templating

For more complex configurations, e.g. with fractional factorial experiments. Templating can be done programmatically, in a similar fashion as templates are created for Pods in FLTK. This can be seen here. Combining this with a package like pyDOE, allows for the generation and running of large sets of experiments with little time.