EDELWAC²E

Reinforcement Learning Agents for GAC²E through Hym with HaskTorch.

Setup

LibTorch is required, as per HaskTorch Documentation, and must be symlinked into this directory. Then source setenv in your shell.

For training, Hym must be up and running.

For tracking, mlflow and mlflow-hs must be installed.

$ source setenv
$ stack build

Usage

With default options

$ stack run

otherwise

$ stack exec -- edelwace-exe [options]

Usage: edelwace-exe [-l|--algorithm ALGORITHM] [-H|--host HOST] [-P|--port PORT]
                    [-i|--ace ID] [-p|--pdk PDK] [-v|--var VARIANT]
                    [-a|--act ACTIONS] [-o|--obs OBSERVATIONS] [-f|--path FILE]
                    [-T|--tracking-host HOST] [-R|--tracking-port PORT]
  GACE RL Trainer

Available options:
  -l,--algorithm ALGORITHM DRL Algorithm, one of sac, td3, ppo (default: "sac")
  -H,--host HOST           Hym server host address (default: "localhost")
  -P,--port PORT           Hym server port (default: "7009")
  -i,--ace ID              ACE OP ID (default: "op2")
  -p,--pdk PDK             ACE Backend (default: "xh035")
  -v,--var VARIANT         GACE Environment Variant (default: "0")
  -a,--act ACTIONS         Dimensions of Action Space (default: 10)
  -o,--obs OBSERVATIONS    Dimensions of Observation Space (default: 39)
  -f,--path FILE           Checkpoint File Path (default: "./models")
  -T,--tracking-host HOST  MLFlow tracking server host address
                           (default: "localhost")
  -R,--tracking-port PORT  MLFlow tracking server port (default: "5000")
  -h,--help                Show this help text

Dependencies

hasktorch
libtorch-ffi
mtl
wreq
aeson
optparse-applicative
mlflow-hs

Algorithms

Haddock is availbale.

Caution: Excessive use of Unicode and Strictness.

Soft Actor Critic (SAC)

Arxiv

Soft Actor Critic (SAC) Agent for continuous action space. Start with -l sac and -v 0 for continuous electrical design space.

It appears that state scaling / standardization makes things worse for SAC. The loss steadily increases and no learning occurs.

Proximal Policy Optimization (PPO)

Arxiv

Proximal Policy Optimization (PPO) Agent for discrete and continuous action spaces. Start with -l ppo and -v 2 for discrete electrical design space.

Dscrete PPO needs about ~4k steps before plateauing around an average reward of ~0.4. The area is way smaller than the target, while offset is not quite reached.

Twin Delayed Deep Deterministic Policy Gradient (TD3)

Arxiv

Twin Delayed Deep Deterministic Policy Gradient (TD3) Agent for continuous action space. Start with -l td3 and -v 0 for continuous electrical design space.

Prioritized Experience Replay (PER)

Arxiv

Only implemented in SAC and deactivated for the moment. To quote ERE Paper:

We show that SAC+PER can marginally improve the sample efficiency performance of SAC, but much less so than SAC+ERE.

Emphasizing Recent Experience (ERE)

Arxiv

...

Hindsight Experience Replay (HER)

Arxiv

...

Results

...

Name		Name	Last commit message	Last commit date
Latest commit History 225 Commits
app		app
doc		doc
docs		docs
src		src
test		test
.gitignore		.gitignore
ChangeLog.md		ChangeLog.md
LICENSE		LICENSE
README.md		README.md
Setup.hs		Setup.hs
edelwace.cabal		edelwace.cabal
hie.yaml		hie.yaml
package.yaml		package.yaml
setenv		setenv
stack.yaml		stack.yaml
stack.yaml.lock		stack.yaml.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

EDELWAC²E

Setup

Usage

Dependencies

Algorithms

Soft Actor Critic (SAC)

Proximal Policy Optimization (PPO)

Twin Delayed Deep Deterministic Policy Gradient (TD3)

Prioritized Experience Replay (PER)

Emphasizing Recent Experience (ERE)

Hindsight Experience Replay (HER)

Results

TODO

About

Languages

License

AugustUnderground/edelwace

Folders and files

Latest commit

History

Repository files navigation

EDELWAC²E

Setup

Usage

Dependencies

Algorithms

Soft Actor Critic (SAC)

Proximal Policy Optimization (PPO)

Twin Delayed Deep Deterministic Policy Gradient (TD3)

Prioritized Experience Replay (PER)

Emphasizing Recent Experience (ERE)

Hindsight Experience Replay (HER)

Results

TODO

About

Topics

Resources

License

Stars

Watchers

Forks

Languages