forked from kvfrans/parallel-trpo
-
Notifications
You must be signed in to change notification settings - Fork 0
/
trials.txt
27 lines (18 loc) · 1.72 KB
/
trials.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
python main.py --task Reacher-v1 --decay_method adaptive-margin --timesteps_per_batch 1000 --max_kl 0.001 --kl_adapt 0.0005 --timestep_adapt 300
python main.py --task Reacher-v1 --decay_method adaptive --timesteps_per_batch 1000 --max_kl 0.001 --kl_adapt 0.0005 --timestep_adapt 300
python main.py --task Reacher-v1 --decay_method adaptive --timesteps_per_batch 20000 --max_kl 0.001 --kl_adapt 0.0005 --timestep_adapt 0
python main.py --task Reacher-v1 --decay_method adaptive --timesteps_per_batch 1000 --max_kl 0.001 --kl_adapt 0 --timestep_adapt 300
python main.py --task Reacher-v1 --decay_method none --timesteps_per_batch 20000 --max_kl 0.001
python main.py --task Reacher-v1 --decay_method none --timesteps_per_batch 20000 --max_kl 0.005
python main.py --task Reacher-v1 --decay_method none --timesteps_per_batch 20000 --max_kl 0.010
# python main.py --task Reacher-v1 --decay_method none --timesteps_per_batch 20000 --max_kl 0.001
python main.py --task Reacher-v1 --decay_method none --timesteps_per_batch 10000 --max_kl 0.001
python main.py --task Reacher-v1 --decay_method none --timesteps_per_batch 5000 --max_kl 0.001
python main.py --task Reacher-v1 --decay_method none --timesteps_per_batch 1500 --max_kl 0.001
do all of these, and then repeat for Swimmer, Hopper, and HalfCheetah.
Status so far:
On reacher: lower steps and higher KL always best, and adapting both works
On halfcheetah: lower steps better but more variation, higher KL always best, adapting both crashes, adapting KL is good
On swimmer: midground best for steps/KL, adapting both crashes, adapting steps is good
try: adapting with a margin of 10%? you need to get higher than 10% avg to adapt.
this should let the numbers hold in the middle (for ex. in swimmer env)