Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Docs] WandB Sweeps example with BenchMARL #105

Merged
merged 6 commits into from
Jul 9, 2024
Merged
Show file tree
Hide file tree
Changes from 4 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
82 changes: 82 additions & 0 deletions examples/sweep/wandb/readme.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,82 @@
# Using Weights & Biases (W&B) Sweeps with BenchMARL

You can improve the performance of your RL agents with hyperparameter tuning. It's easy to train multiple models with different hyperparameters using hyperparameter sweep on W&B with BenchMARL and Hydra. Modify `sweepconfig.yaml` to define your sweep configuration and run it from the command line.

## Prerequisites

- Ensure you have Weights & Biases: `pip install wandb` installed on top of benchmarl requirements.

- Update the `benchmarl/conf/config.yaml` with your desired experiment setup, e.g.:

```yaml
defaults:
- experiment: base_experiment
- algorithm: ippo
- task: customenv/task_1
- model: layers/mlp
- model@critic_model: layers/mlp
- _self_

seed: 0
```

## Step 1: Define Your Sweep Configuration

First, create or modify the `sweepconfig.yaml` file. Check the [W&B Sweep Configuration Documentation](https://docs.wandb.ai/guides/sweeps/sweep-config-keys) for detailed configuration options.


The YAML file already contains the basic elements required to work with BenchMARL. Change the values according to your desired experiment setup. Note that the parameters in the YAML file should use dots (e.g., `experiment.lr`) rather than standard double nested configurations ([like in this community discussion](https://community.wandb.ai/t/nested-sweep-configuration/3369)) since you are using Hydra.


```yaml
entity: "ENTITY_NAME"

#options: bayes, random, grid
method: bayes

metric:
name: eval/agent/reward/episode_reward_mean
goal: maximize

parameters:
experiment.lr:
max: 0.003
min: 0.000025
# distribution: uniform

experiment.max_n_iters:
value: 321

```

## Step 2: Initialize sweep

To run the sweep, initialize it using the following command in your terminal:

```bash
wandb sweep sweepconfig.yaml
```

W&B will automatically create a sweep and return a command for you to run, like:

```bash
wandb: Created sweep with ID: xyz123
wandb: View sweep at: https://wandb.ai/your_entity/your_project/sweeps/xyz123
wandb: Run sweep agent with: wandb agent your_entity/your_project/xyz123
```

## Step 3: Start sweep agents
Run the command provided in the terminal to start the sweep agent:

```bash
wandb agent mc-team/project-name/xyz123
```

This will start the agent and begin running experiments according to your sweep configuration.

## References

https://wandb.ai/adrishd/hydra-example/reports/Configuring-W-B-Projects-with-Hydra--VmlldzoxNTA2MzQw?galleryTag=posts&utm_source=fully_connected&utm_medium=blog&utm_campaign=hydra

https://docs.wandb.ai/guides/sweeps

55 changes: 55 additions & 0 deletions examples/sweep/wandb/sweepconfig.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,55 @@
program: PATH_TO_YOUR_DIRECTORY\benchmarl\run.py
project: "YOUR_PROJECT_NAME"
entity: "YOUR_ENTITY_NAME"

method: bayes

metric:
name: eval/agent/reward/episode_reward_mean
goal: maximize

parameters:

# experiment hyperparameters

experiment.lr:
max: 0.003
min: 0.000025
# distribution: uniform

experiment.max_n_iters:
value: 321
experiment.on_policy_collected_frames_per_batch:
value: 4040
experiment.on_policy_n_minibatch_iters:
values: [1, 2]

experiment.on_policy_minibatch_size:
values: [64, 128, 256]

# algorithm hyperparameters
algorithm.entropy_coef:
max: 0.05
min: 0
distribution: uniform

# task hyperparameters
task.goal_type:
value: "corr"
# distribution: categorical


early_terminate:
type: hyperband
max_iter: 27
s: 3
# seed:
# max: 84
# min: 0
# distribution: int_uniform

command:
- ${env}
- python
- ${program}
- ${args_no_hyphens}
matteobettini marked this conversation as resolved.
Show resolved Hide resolved
Loading