Skip to content

Commit

Permalink
V0.3: Upgrade RL Workflow; Add RL Benchmarks; Update Package Version (#…
Browse files Browse the repository at this point in the history
…588)

* call policy update only for AbsCorePolicy

* add limitation of AbsCorePolicy in Actor.collect()

* refined actor to return only experiences for policies that received new experiences

* fix MsgKey issue in rollout_manager

* fix typo in learner

* call exit function for parallel rollout manager

* update supply chain example distributed training scripts

* 1. moved exploration scheduling to rollout manager; 2. fixed bug in lr schedule registration in core model; 3. added parallel policy manager prorotype

* reformat render

* fix supply chain business engine action type problem

* reset supply chain example render figsize from 4 to 3

* Add render to all modes of supply chain example

* fix or policy typos

* 1. added parallel policy manager prototype; 2. used training ep for evaluation episodes

* refined parallel policy manager

* updated rl/__init__/py

* fixed lint issues and CIM local learner bugs

* deleted unwanted supply_chain test files

* revised default config for cim-dqn

* removed test_store.py as it is no longer needed

* 1. changed Actor class to rollout_worker function; 2. renamed algorithm to algorithms

* updated figures

* removed unwanted import

* refactored CIM-DQN example

* added MultiProcessRolloutManager and MultiProcessTrainingManager

* updated doc

* lint issue fix

* lint issue fix

* fixed import formatting

* [Feature] Prioritized Experience Replay (#355)

* added prioritized experience replay

* deleted unwanted supply_chain test files

* fixed import order

* import fix

* fixed lint issues

* fixed import formatting

* added note in docstring that rank-based PER has yet to be implemented

Co-authored-by: ysqyang <v-yangqi@microsoft.com>

* rm AbsDecisionGenerator

* small fixes

* bug fix

* reorganized training folder structure

* fixed lint issues

* fixed lint issues

* policy manager refined

* lint fix

* restructured CIM-dqn sync code

* added policy version index and used it as a measure of experience staleness

* lint issue fix

* lint issue fix

* switched log_dir and proxy_kwargs order

* cim example refinement

* eval schedule sorted only when it's a list

* eval schedule sorted only when it's a list

* update sc env wrapper

* added docker scripts for cim-dqn

* refactored example folder structure and added workflow templates

* fixed lint issues

* fixed lint issues

* fixed template bugs

* removed unused imports

* refactoring sc in progress

* simplified cim meta

* fixed build.sh path bug

* template refinement

* deleted obsolete svgs

* updated learner logs

* minor edits

* refactored templates for easy merge with async PR

* added component names for rollout manager and policy manager

* fixed incorrect position to add last episode to eval schedule

* added max_lag option in templates

* formatting edit in docker_compose_yml script

* moved local learner and early stopper outside sync_tools

* refactored rl toolkit folder structure

* refactored rl toolkit folder structure

* moved env_wrapper and agent_wrapper inside rl/learner

* refined scripts

* fixed typo in script

* changes needed for running sc

* removed unwanted imports

* config change for testing sc scenario

* changes for perf testing

* Asynchronous Training (#364)

* remote inference code draft

* changed actor to rollout_worker and updated init files

* removed unwanted import

* updated inits

* more async code

* added async scripts

* added async training code & scripts for CIM-dqn

* changed async to async_tools to avoid conflict with python keyword

* reverted unwanted change to dockerfile

* added doc for policy server

* addressed PR comments and fixed a bug in docker_compose_yml.py

* fixed lint issue

* resolved PR comment

* resolved merge conflicts

* added async templates

* added proxy.close() for actor and policy_server

* fixed incorrect position to add last episode to eval schedule

* reverted unwanted changes

* added missing async files

* rm unwanted echo in kill.sh

Co-authored-by: ysqyang <v-yangqi@microsoft.com>

* renamed sync to synchronous and async to asynchronous to avoid conflict with keyword

* added missing policy version increment in LocalPolicyManager

* refined rollout manager recv logic

* removed a debugging print

* added sleep in distributed launcher to avoid hanging

* updated api doc and rl toolkit doc

* refined dynamic imports using importlib

* 1. moved policy update triggers to policy manager; 2. added version control in policy manager

* fixed a few bugs and updated cim RL example

* fixed a few more bugs

* added agent wrapper instantiation to workflows

* added agent wrapper instantiation to workflows

* removed abs_block and added max_prob option for DiscretePolicyNet and DiscreteACNet

* fixed incorrect get_ac_policy signature for CIM

* moved exploration inside core policy

* added state to exploration call to support context-dependent exploration

* separated non_rl_policy_index and rl_policy_index in workflows

* modified sc example code according to workflow changes

* modified sc example code according to workflow changes

* added replay_agent_ids parameter to get_env_func for RL examples

* fixed a few bugs

* added maro/simulator/scenarios/supply_chain as bind mount

* added post-step, post-collect, post-eval and post-update callbacks

* fixed lint issues

* fixed lint issues

* moved instantiation of policy manager inside simple learner

* fixed env_wrapper get_reward signature

* minor edits

* removed get_eperience kwargs from env_wrapper

* 1. renamed step_callback to post_step in env_wrapper; 2. added get_eval_env_func to RL workflows

* added rollout exp disribution option in RL examples

* removed unwanted files

* 1. made logger internal in learner; 2 removed logger creation in abs classes

* checked out supply chain test files from v0.2_sc

* 1. added missing model.eval() to choose_action; 2.added entropy features to AC

* fixed a bug in ac entropy

* abbreviated coefficient to coeff

* removed -dqn from job name in rl example config

* added tmp patch to dev.df

* renamed image name for running rl examples

* added get_loss interface for core policies

* added policy manager in rl_toolkit.rst

* 1. env_wrapper bug fix; 2. policy manager update logic refinement

* refactored policy and algorithms

* policy interface redesigned

* refined policy interfaces

* fixed typo

* fixed bugs in refactored policy interface

* fixed some bugs

* refactoring in progress

* policy interface and policy manager redesigned

* 1. fixed bugs in ac and pg; 2. fixed bugs rl workflow scripts

* fixed bug in distributed policy manager

* fixed lint issues

* fixed lint issues

* added scipy in setup

* 1. trimmed rollout manager code; 2. added option to docker scripts

* updated api doc for policy manager

* 1. simplified rl/learning code structure; 2. fixed bugs in rl example docker script

* 1. simplified rl example structure; 2. fixed lint issues

* further rl toolkit code simplifications

* more numpy-based optimization in RL toolkit

* moved replay buffer inside policy

* bug fixes

* numpy optimization and associated refactoring

* extracted shaping logic out of env_sampler

* fixed bug in CIM shaping and lint issues

* preliminary implemetation of parallel batch inference

* fixed bug in ddpg transition recording

* put get_state, get_env_actions, get_reward back in EnvSampler

* simplified exploration and core model interfaces

* bug fixes and doc update

* added improve() interface for RLPolicy for single-thread support

* fixed simple policy manager bug

* updated doc, rst, notebook

* updated notebook

* fixed lint issues

* fixed entropy bugs in ac.py

* reverted to simple policy manager as default

* 1. unified single-thread and distributed mode in learning_loop.py; 2. updated api doc for algorithms and rst for rl toolkit

* fixed lint issues and updated rl toolkit images

* removed obsolete images

* added back agent2policy for general workflow use

* V0.2 rl refinement dist (#377)

* Support `slice` operation in ExperienceSet

* Support naive distributed policy training by proxy

* Dynamically allocate trainers according to number of experience

* code check

* code check

* code check

* Fix a bug in distributed trianing with no gradient

* Code check

* Move Back-Propagation from trainer to policy_manager and extract trainer-allocation strategy

* 1.call allocate_trainer() at first of update(); 2.refine according to code review

* Code check

* Refine code with new interface

* Update docs of PolicyManger and ExperienceSet

* Add images for rl_toolkit docs

* Update diagram of PolicyManager

* Refine with new interface

* Extract allocation strategy into `allocation_strategy.py`

* add `distributed_learn()` in policies for data-parallel training

* Update doc of RL_toolkit

* Add gradient workers for data-parallel

* Refine code and update docs

* Lint check

* Refine by comments

* Rename `trainer` to `worker`

* Rename `distributed_learn` to `learn_with_data_parallel`

* Refine allocator and remove redundant code in policy_manager

* remove arugments in allocate_by_policy and so on

* added checkpointing for simple and multi-process policy managers

* 1. bug fixes in checkpointing; 2. removed version and max_lag in rollout manager

* added missing set_state and get_state for CIM policies

* removed blank line

* updated RL workflow README

* Integrate `data_parallel` arguments into `worker_allocator` (#402)

* 1. simplified workflow config; 2. added comments to CIM shaping

* lint issue fix

* 1. added algorithm type setting in CIM config; 2. added try-except clause for initial policy state loading

* 1. moved post_step callback inside env sampler; 2. updated README for rl workflows

* refined READEME for CIM

* VM scheduling with RL (#375)

* added part of vm scheduling RL code

* refined vm env_wrapper code style

* added DQN

* added get_experiences func for ac in vm scheduling

* added post_step callback to env wrapper

* moved Aiming's tracking and plotting logic into callbacks

* added eval env wrapper

* renamed AC config variable name for VM

* vm scheduling RL code finished

* updated README

* fixed various bugs and hard coding for vm_scheduling

* uncommented callbacks for VM scheduling

* Minor revision for better code style

* added part of vm scheduling RL code

* refined vm env_wrapper code style

* vm scheduling RL code finished

* added config.py for vm scheduing

* vm example refactoring

* fixed bugs in vm_scheduling

* removed unwanted files from cim dir

* reverted to simple policy manager as default

* added part of vm scheduling RL code

* refined vm env_wrapper code style

* vm scheduling RL code finished

* added config.py for vm scheduing

* resolved rebase conflicts

* fixed bugs in vm_scheduling

* added get_state and set_state to vm_scheduling policy models

* updated README for vm_scheduling with RL

Co-authored-by: yaqiu <v-yaqiu@microsoft.com>
Co-authored-by: Huoran Li <huoranli@microsoft.com>

* SC refinement (#397)

* Refine test scripts & pending_order_daily logic

* Refactor code for better code style: complete type hint, correct typos, remove unused items.

Refactor code for better code style: complete type hint, correct typos, remove unused items.

* Polish test_supply_chain.py

* update import format

* Modify vehicle steps logic & remove outdated test case

* Optimize imports

* Optimize imports

* Lint error

* Lint error

* Lint error

* Add SupplyChainAction

* Lint error

Co-authored-by: Jinyu Wang <Wang.Jinyu@microsoft.com>

* refined workflow scripts

* fixed bug in ParallelAgentWrapper

* 1. fixed lint issues; 2. refined main script in workflows

* lint issue fix

* restored default config for rl example

* Update rollout.py

* refined env var processing in policy manager workflow

* added hasattr check in agent wrapper

* updated docker_compose_yml.py

* Minor refinement

* Minor PR. Prepare to merge latest master branch into v0.3 branch. (#412)

* Prepare to merge master_mirror

* Lint error

* Minor

* Merge latest master into v0.3 (#426)

* update docker hub init (#367)

* update docker hub init

* replace personal account with maro-team

* update hello files for CIM

* update docker repository name

* update docker file name

* fix bugs in notebook, rectify docs

* fix doc build issue

* remove docs from playground; fix citibike lp example Event issue

* update the exampel for vector env

* update vector env example

* update README due to PR comments

* add link to playground above MARO installation in README

* fix some typos

Co-authored-by: Jinyu Wang <Wang.Jinyu@microsoft.com>

* update package version

* update README for package description

* update image links for pypi package description

* update image links for pypi package description

* change the input topology schema for CIM real data mode (#372)

* change the input topology schema for CIM real data mode

* remove unused importing

* update test config file correspondingly

* add Exception for env test

* add cost factors to cim data dump

* update CimDataCollection field name

* update field name of data collection related code

* update package version

* adjust interface to reflect actual signature (#374)

Co-authored-by: Jeremy Reynolds <jeremr@microsoft.com>

* update dataclasses requirement to setup

* fix: fixing spelling grammarr

* fix: fix typo spelling code commented and data_model.rst

* Fix Geo vis IP address & SQL logic bugs. (#383)

Fix Geo vis IP address & SQL logic bugs (issue [352](https://github.com/microsoft/maro/issues/352) and [314](https://github.com/microsoft/maro/issues/314)).

* Fix the "Wrong future stop tick predictions" bug (#386)

* Propose my new solution

Refine to the pre-process version

.

* Optimize import

* Fix reset random seed bug (#387)

* update the reset interface of Env and BE

* Try to fix reset routes generation seed issue

* Refine random related logics.

* Minor refinement

* Test check

* Minor

* Remove unused functions so far

* Minor

Co-authored-by: Jinyu Wang <jinywan@microsoft.com>

* update package version

* Add _init_vessel_plans in business_engine.reset (#388)

* update package version

* change the default solver used in Citibike OnlineLP example, from GLPK to CBC (#391)

Co-authored-by: Jinyu Wang <Wang.Jinyu@microsoft.com>

* Refine `event_buffer/` module (#389)

* Core & Business Engine code refinement (#392)

* First version

* Optimize imports

* Add typehint

* Lint check

* Lint check

* add higher python version (#398)

* add higher python version

* update pytorch version

* update torchvision version

Co-authored-by: Jinyu Wang <Wang.Jinyu@microsoft.com>

* CIM scenario refinement (#400)

* Cim scenario refinement (#394)

* CIM refinement

* Fix lint error

* Fix lint error

* Cim test coverage (#395)

* Enrich tests

* Refactor CimDataGenerator

* Refactor CIM parsers

* Minor refinement

* Fix lint error

* Fix lint error

* Fix lint error

* Minor refactor

* Type

* Add two test file folders. Make a slight change to CIM BE.

* Lint error

* Lint error

* Remove unnecessary public interfaces of CIM BE

* Cim disable auto action type detection (#399)

* Haven't been tested

* Modify document

* Add ActionType checking

* Minor

* Lint error

* Action quantity should be a position number

* Modify related docs & notebooks

* Minor

* Change test file name. Prepare to merge into master.

* .

* Minor test patch

* Add `clear()` function to class `SimRandom` (#401)

* Add SimRandom.clear()

* Minor

* Remove commented codes

* Lint error

* update package version

* Minor

* Remove docs/source/examples/multi_agent_dqn_cim.rst

* Update .gitignore

* Update .gitignore

Co-authored-by: Jinyu-W <53509467+Jinyu-W@users.noreply.github.com>
Co-authored-by: Jinyu Wang <Wang.Jinyu@microsoft.com>
Co-authored-by: Jinyu Wang <jinywan@microsoft.com>
Co-authored-by: Jeremy Reynolds <jeremy.reynolds@microsoft.com>
Co-authored-by: Jeremy Reynolds <jeremr@microsoft.com>
Co-authored-by: slowy07 <slowy.arfy@gmail.com>

* Change `Env.set_seed()` logic (#456)

* Change Env.set_seed() logic

* Redesign CIM reset logic; fix lint issues;

* Lint

* Seed type assertion

* Remove all SC related files (#473)

* RL Toolkit V3 (#471)

* added daemon=True for multi-process rollout, policy manager and inference

* removed obsolete files

* [REDO][PR#406]V0.2 rl refinement taskq (#408)

* Add a usable task_queue

* Rename some variables

* 1. Add ; 2. Integrate  related files; 3. Remove

* merge `data_parallel` and `num_grad_workers` into `data_parallelism`

* Fix bugs in docker_compose_yml.py and Simple/Multi-process mode.

* Move `grad_worker` into marl/rl/workflows

* 1.Merge data_parallel and num_workers into data_parallelism in config; 2.Assign recently used workers as possible in task_queue.

* Refine code and update docs of `TaskQueue`

* Support priority for tasks in `task_queue`

* Update diagram of policy manager and task queue.

* Add configurable `single_task_limit` and correct docstring about `data_parallelism`

* Fix lint errors in `supply chain`

* RL policy redesign (V2) (#405)

* Drafi v2.0 for V2

* Polish models with more comments

* Polish policies with more comments

* Lint

* Lint

* Add developer doc for models.

* Add developer doc for policies.

* Remove policy manager V2 since it is not used and out-of-date

* Lint

* Lint

* refined messy workflow code

* merged 'scenario_dir' and 'scenario' in rl config

* 1. refined env_sampler and agent_wrapper code; 2. added docstrings for env_sampler methods

* 1. temporarily renamed RLPolicy from polivy_v2 to RLPolicyV2; 2. merged env_sampler and env_sampler_v2

* merged cim and cim_v2

* lint issue fix

* refined logging logic

* lint issue fix

* reversed unwanted changes

* .

.

.

.

ReplayMemory & IndexScheduler

ReplayMemory & IndexScheduler

.

MultiReplayMemory

get_actions_with_logps

EnvSampler on the road

EnvSampler

Minor

* LearnerManager

* Use batch to transfer data & add SHAPE_CHECK_FLAG

* Rename learner to trainer

* Add property for policy._is_exploring

* CIM test scenario for V3. Manual test passed. Next step: run it, make it works.

* env_sampler.py could run

* env_sampler refine on the way

* First runnable version done

* AC could run, but the result is bad. Need to check the logic

* Refine abstract method & shape check error info.

* Docs

* Very detailed compare. Try again.

* AC done

* DQN check done

* Minor

* DDPG, not tested

* Minors

* A rough draft of MAAC

* Cannot use CIM as the multi-agent scenario.

* Minor

* MAAC refinement on the way

* Remove ActionWithAux

* Refine batch & memory

* MAAC example works

* Reproduce-able fix. Policy share between env_sampler and trainer_manager.

* Detail refinement

* Simplify the user configed workflow

* Minor

* Refine example codes

* Minor polishment

* Migrate rollout_manager to V3

* Error on the way

* Redesign torch.device management

* Rl v3 maddpg (#418)

* Add MADDPG trainer

* Fit independent critics and shared critic modes.

* Add a new property: num_policies

* Lint

* Fix a bug in `sum(rewards)`

* Rename `MADDPG` to `DiscreteMADDPG` and fix type hint.

* Rename maddpg in examples.

* Preparation for data parallel (#420)

* Preparation for data parallel

* Minor refinement & lint fix

* Lint

* Lint

* rename atomic_get_batch_grad to get_batch_grad

* Fix a unexpected commit

* distributed maddpg

* Add critic worker

* Minor

* Data parallel related minorities

* Refine code structure for trainers & add more doc strings

* Revert a unwanted change

* Use TrainWorker to do the actual calculations.

* Some minor redesign of the worker's abstraction

* Add set/get_policy_state_dict back

* Refine set/get_policy_state_dict

* Polish policy trainers

move train_batch_size to abs trainer
delete _train_step_impl()
remove _record_impl
remove unused methods
a minor bug fix in maddpg

* Rl v3 data parallel grad worker (#432)

* Fit new `trainer_worker` in `grad_worker` and `task_queue`.

* Add batch dispatch

* Add `tensor_dict` for task submit interface

* Move `_remote_learn` to `AbsTrainWorker`.

* Complement docstring for task queue and trainer.

* Rename train worker to train ops; add placeholder for abstract methods;

* Lint

Co-authored-by: GQ.Chen <v-guanchen@microsoft.com>

* [DRAFT] distributed training pipeline based on RL Toolkit V3 (#450)

* Preparation for data parallel

* Minor refinement & lint fix

* Lint

* Lint

* rename atomic_get_batch_grad to get_batch_grad

* Fix a unexpected commit

* distributed maddpg

* Add critic worker

* Minor

* Data parallel related minorities

* Refine code structure for trainers & add more doc strings

* Revert a unwanted change

* Use TrainWorker to do the actual calculations.

* Some minor redesign of the worker's abstraction

* Add set/get_policy_state_dict back

* Refine set/get_policy_state_dict

* Polish policy trainers

move train_batch_size to abs trainer
delete _train_step_impl()
remove _record_impl
remove unused methods
a minor bug fix in maddpg

* Rl v3 data parallel grad worker (#432)

* Fit new `trainer_worker` in `grad_worker` and `task_queue`.

* Add batch dispatch

* Add `tensor_dict` for task submit interface

* Move `_remote_learn` to `AbsTrainWorker`.

* Complement docstring for task queue and trainer.

* dsitributed training pipeline draft

* added temporary test files for review purposes

* Several code style refinements (#451)

* Polish rl_v3/utils/

* Polish rl_v3/distributed/

* Polish rl_v3/policy_trainer/abs_trainer.py

* fixed merge conflicts

* unified sync and async interfaces

* refactored rl_v3; refinement in progress

* Finish the runnable pipeline under new design

* Remove outdated files; refine class names; optimize imports;

* Lint

* Minor maddpg related refinement

* Lint

Co-authored-by: Default <huo53926@126.com>
Co-authored-by: Huoran Li <huoranli@microsoft.com>
Co-authored-by: GQ.Chen <v-guanchen@microsoft.com>
Co-authored-by: ysqyang <v-yangqi@microsoft.com>

* Miner bug fix

* Coroutine-related bug fix ("get_policy_state") (#452)

* fixed rebase conflicts

* renamed get_policy_func_dict to policy_creator

* deleted unwanted folder

* removed unwanted changes

* resolved PR452 comments

Co-authored-by: ysqyang <v-yangqi@microsoft.com>

* Quick fix

* Redesign experience recording logic (#453)

* Two not important fix

* Temp draft. Prepare to WFH

* Done

* Lint

* Lint

* Calculating advantages / returns (#454)

* V1.0

* Complete DDPG

* Rl v3 hanging issue fix (#455)

* fixed rebase conflicts

* renamed get_policy_func_dict to policy_creator

* unified worker interfaces

* recovered some files

* dist training + cli code move

* fixed bugs

* added retry logic to client

* 1. refactored CIM with various algos; 2. lint

* lint

* added type hint

* removed some logs

* lint

* Make main.py more IDE friendly

* Make main.py more IDE friendly

* Lint

* Final test & format. Ready to merge.

Co-authored-by: ysqyang <v-yangqi@microsoft.com>
Co-authored-by: yaqiu <v-yaqiu@microsoft.com>
Co-authored-by: Huoran Li <huoranli@microsoft.com>

* Rl v3 parallel rollout (#457)

* fixed rebase conflicts

* renamed get_policy_func_dict to policy_creator

* unified worker interfaces

* recovered some files

* dist training + cli code move

* fixed bugs

* added retry logic to client

* 1. refactored CIM with various algos; 2. lint

* lint

* added type hint

* removed some logs

* lint

* Make main.py more IDE friendly

* Make main.py more IDE friendly

* Lint

* load balancing dispatcher

* added parallel rollout

* lint

* Tracker variable type issue; rename to env_sampler_creator;

* Rl v3 parallel rollout follow ups (#458)

* AbsWorker & AbsDispatcher

* Pass env idx to AbsTrainer.record() method, and let the trainer to decide how to record experiences sampled from different worlds.

* Fix policy_creator reuse bug

* Format code

* Merge AbsTrainerManager & SimpleTrainerManager

* AC test passed

* Lint

* Remove AbsTrainer.build() method. Put all initialization operations into __init__

* Redesign AC preprocess batches logic

Co-authored-by: ysqyang <v-yangqi@microsoft.com>
Co-authored-by: yaqiu <v-yaqiu@microsoft.com>
Co-authored-by: Huoran Li <huoranli@microsoft.com>

* MADDPG performance bug fix (#459)

* Fix MARL (MADDPG) terminal recording bug; some other minor refinements;

* Restore Trainer.build() method

* Calculate latest action in the get_actor_grad method in MADDPG.

* Share critic bug fix

* Rl v3 example update (#461)

* updated vm_scheduling example and cim notebook

* fixed bugs in vm_scheduling

* added local train method

* bug fix

* modified async client logic to fix hidden issue

* reverted to default config

* fixed PR comments and some bugs

* removed hardcode

Co-authored-by: ysqyang <v-yangqi@microsoft.com>
Co-authored-by: yaqiu <v-yaqiu@microsoft.com>

* Done (#462)

* Rl v3 load save (#463)

* added load/save feature

* fixed some bugs

* reverted unwanted changes

* lint

* fixed PR comments

Co-authored-by: ysqyang <v-yangqi@microsoft.com>
Co-authored-by: yaqiu <v-yaqiu@microsoft.com>

* RL Toolkit data parallelism revamp & config utils (#464)

* added load/save feature

* fixed some bugs

* reverted unwanted changes

* lint

* fixed PR comments

* 1. fixed data parallelism issue; 2. added config validator; 3. refactored cli local

* 1. fixed rollout exit issue; 2. refined config

* removed config file from example

* fixed lint issues

* fixed lint issues

* added main.py under examples/rl

* fixed lint issues

Co-authored-by: ysqyang <v-yangqi@microsoft.com>
Co-authored-by: yaqiu <v-yaqiu@microsoft.com>

* RL doc string (#465)

* First rough draft

* Minors

* Reformat

* Lint

* Resolve PR comments

* Rl type specific env getter (#466)

* 1. type-sensitive env variable getter; 2. updated READMEs for examples

* fixed bugs

* fixed bugs

* bug fixes

* lint

Co-authored-by: ysqyang <v-yangqi@microsoft.com>
Co-authored-by: yaqiu <v-yaqiu@microsoft.com>

* Example bug fix

* Optimize parser.py

* Resolve PR comments

* Rl config doc (#467)

* 1. type-sensitive env variable getter; 2. updated READMEs for examples

* added detailed doc

* lint

* wording refined

* resolved some PR comments

* resolved more PR comments

* typo fix

Co-authored-by: ysqyang <v-yangqi@microsoft.com>

Co-authored-by: ysqyang <ysqyang@gmail.com>
Co-authored-by: ysqyang <v-yangqi@microsoft.com>
Co-authored-by: yaqiu <v-yaqiu@microsoft.com>

* RL online doc (#469)

* Model, policy, trainer

* RL workflows and env sampler doc in RST (#468)

* First rough draft

* Minors

* Reformat

* Lint

* Resolve PR comments

* 1. type-sensitive env variable getter; 2. updated READMEs for examples

* Rl type specific env getter (#466)

* 1. type-sensitive env variable getter; 2. updated READMEs for examples

* fixed bugs

* fixed bugs

* bug fixes

* lint

Co-authored-by: ysqyang <v-yangqi@microsoft.com>
Co-authored-by: yaqiu <v-yaqiu@microsoft.com>

* Example bug fix

* Optimize parser.py

* Resolve PR comments

* added detailed doc

* lint

* wording refined

* resolved some PR comments

* rewriting rl toolkit rst

* resolved more PR comments

* typo fix

* updated rst

Co-authored-by: Huoran Li <huoranli@microsoft.com>
Co-authored-by: Default <huo53926@126.com>
Co-authored-by: ysqyang <v-yangqi@microsoft.com>
Co-authored-by: yaqiu <v-yaqiu@microsoft.com>

* Finish docs/source/key_components/rl_toolkit.rst

* API doc

* RL online doc image fix (#470)

* resolved some PR comments

* fix

* fixed PR comments

* added numfig=True setting in conf.py for sphinx

Co-authored-by: ysqyang <v-yangqi@microsoft.com>

* Resolve PR comments

* Add example github link

Co-authored-by: ysqyang <ysqyang@gmail.com>
Co-authored-by: ysqyang <v-yangqi@microsoft.com>
Co-authored-by: yaqiu <v-yaqiu@microsoft.com>

* Rl v3 pr comment resolution (#474)

* added load/save feature

* 1. resolved pr comments; 2. reverted maro/cli/k8s

* fixed some bugs

Co-authored-by: ysqyang <v-yangqi@microsoft.com>
Co-authored-by: yaqiu <v-yaqiu@microsoft.com>

Co-authored-by: yaqiu <v-yaqiu@microsoft.com>
Co-authored-by: GQ.Chen <v-guanchen@microsoft.com>
Co-authored-by: ysqyang <ysqyang@gmail.com>
Co-authored-by: ysqyang <v-yangqi@microsoft.com>

* RL renaming v2 (#476)

* Change all Logger in RL to LoggerV2

* TrainerManager => TrainingManager

* Add Trainer suffix to all algorithms

* Finish docs

* Update interface names

* Minor fix

* Cherry pick latest RL (#498)

* Cherry pick

* Remove SC related files

* Cherry pick RL changes from `sc_refinement` (latest commit: `2a4869`) (#509)

* Cherry pick RL changes from sc_refinement (2a4869)

* Limit time display precision

* RL incremental refactor (#501)

* Refactor rollout logic. Allow multiple sampling in one epoch, so that we can generate more data for training.

AC & PPO for continuous action policy; refine AC & PPO logic.

Cherry pick RL changes from GYM-DDPG

Cherry pick RL changes from GYM-SAC

Minor error in doc string

* Add min_n_sample in template and parser

* Resolve PR comments. Fix a minor issue in SAC.

* RL component bundle (#513)

* CIM passed

* Update workers

* Refine annotations

* VM passed

* Code formatting.

* Minor import loop issue

* Pass batch in PPO again

* Remove Scenario

* Complete docs

* Minor

* Remove segment

* Optimize logic in RLComponentBundle

* Resolve PR comments

* Move 'post methods from RLComponenetBundle to EnvSampler

* Add method to get mapping of available tick to frame index (#415)

* add method to get mapping of available tick to frame index

* fix lint issue

* fix naming issue

* Cherry pick from sc_refinement (#527)

* Cherry pick from sc_refinement

* Cherry pick from sc_refinement

* Refine `terminal` / `next_agent_state` logic (#531)

* Optimize RL toolkit

* Fix bug in terminal/next_state generation

* Rewrite terminal/next_state logic again

* Minor renaming

* Minor bug fix

* Resolve PR comments

* Merge master into v0.3 (#536)

* update docker hub init (#367)

* update docker hub init

* replace personal account with maro-team

* update hello files for CIM

* update docker repository name

* update docker file name

* fix bugs in notebook, rectify docs

* fix doc build issue

* remove docs from playground; fix citibike lp example Event issue

* update the exampel for vector env

* update vector env example

* update README due to PR comments

* add link to playground above MARO installation in README

* fix some typos

Co-authored-by: Jinyu Wang <Wang.Jinyu@microsoft.com>

* update package version

* update README for package description

* update image links for pypi package description

* update image links for pypi package description

* change the input topology schema for CIM real data mode (#372)

* change the input topology schema for CIM real data mode

* remove unused importing

* update test config file correspondingly

* add Exception for env test

* add cost factors to cim data dump

* update CimDataCollection field name

* update field name of data collection related code

* update package version

* adjust interface to reflect actual signature (#374)

Co-authored-by: Jeremy Reynolds <jeremr@microsoft.com>

* update dataclasses requirement to setup

* fix: fixing spelling grammarr

* fix: fix typo spelling code commented and data_model.rst

* Fix Geo vis IP address & SQL logic bugs. (#383)

Fix Geo vis IP address & SQL logic bugs (issue [352](https://github.com/microsoft/maro/issues/352) and [314](https://github.com/microsoft/maro/issues/314)).

* Fix the "Wrong future stop tick predictions" bug (#386)

* Propose my new solution

Refine to the pre-process version

.

* Optimize import

* Fix reset random seed bug (#387)

* update the reset interface of Env and BE

* Try to fix reset routes generation seed issue

* Refine random related logics.

* Minor refinement

* Test check

* Minor

* Remove unused functions so far

* Minor

Co-authored-by: Jinyu Wang <jinywan@microsoft.com>

* update package version

* Add _init_vessel_plans in business_engine.reset (#388)

* update package version

* change the default solver used in Citibike OnlineLP example, from GLPK to CBC (#391)

Co-authored-by: Jinyu Wang <Wang.Jinyu@microsoft.com>

* Refine `event_buffer/` module (#389)

* Core & Business Engine code refinement (#392)

* First version

* Optimize imports

* Add typehint

* Lint check

* Lint check

* add higher python version (#398)

* add higher python version

* update pytorch version

* update torchvision version

Co-authored-by: Jinyu Wang <Wang.Jinyu@microsoft.com>

* CIM scenario refinement (#400)

* Cim scenario refinement (#394)

* CIM refinement

* Fix lint error

* Fix lint error

* Cim test coverage (#395)

* Enrich tests

* Refactor CimDataGenerator

* Refactor CIM parsers

* Minor refinement

* Fix lint error

* Fix lint error

* Fix lint error

* Minor refactor

* Type

* Add two test file folders. Make a slight change to CIM BE.

* Lint error

* Lint error

* Remove unnecessary public interfaces of CIM BE

* Cim disable auto action type detection (#399)

* Haven't been tested

* Modify document

* Add ActionType checking

* Minor

* Lint error

* Action quantity should be a position number

* Modify related docs & notebooks

* Minor

* Change test file name. Prepare to merge into master.

* .

* Minor test patch

* Add `clear()` function to class `SimRandom` (#401)

* Add SimRandom.clear()

* Minor

* Remove commented codes

* Lint error

* update package version

* add branch v0.3 to github workflow

* update github test workflow

* Update requirements.dev.txt (#444)

Added the versions of dependencies and resolve some conflicts occurs when installing. By adding these version number it will tell you the exact.

* Bump ipython from 7.10.1 to 7.16.3 in /notebooks (#460)

Bumps [ipython](https://github.com/ipython/ipython) from 7.10.1 to 7.16.3.
- [Release notes](https://github.com/ipython/ipython/releases)
- [Commits](https://github.com/ipython/ipython/compare/7.10.1...7.16.3)

---
updated-dependencies:
- dependency-name: ipython
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <support@github.com>

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* Add & sort requirements.dev.txt

Co-authored-by: Jinyu-W <53509467+Jinyu-W@users.noreply.github.com>
Co-authored-by: Jinyu Wang <Wang.Jinyu@microsoft.com>
Co-authored-by: Jinyu Wang <jinywan@microsoft.com>
Co-authored-by: Jeremy Reynolds <jeremy.reynolds@microsoft.com>
Co-authored-by: Jeremy Reynolds <jeremr@microsoft.com>
Co-authored-by: slowy07 <slowy.arfy@gmail.com>
Co-authored-by: solosilence <abhishekkr23rs@gmail.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* Merge master into v0.3 (#545)

* update docker hub init (#367)

* update docker hub init

* replace personal account with maro-team

* update hello files for CIM

* update docker repository name

* update docker file name

* fix bugs in notebook, rectify docs

* fix doc build issue

* remove docs from playground; fix citibike lp example Event issue

* update the exampel for vector env

* update vector env example

* update README due to PR comments

* add link to playground above MARO installation in README

* fix some typos

Co-authored-by: Jinyu Wang <Wang.Jinyu@microsoft.com>

* update package version

* update README for package description

* update image links for pypi package description

* update image links for pypi package description

* change the input topology schema for CIM real data mode (#372)

* change the input topology schema for CIM real data mode

* remove unused importing

* update test config file correspondingly

* add Exception for env test

* add cost factors to cim data dump

* update CimDataCollection field name

* update field name of data collection related code

* update package version

* adjust interface to reflect actual signature (#374)

Co-authored-by: Jeremy Reynolds <jeremr@microsoft.com>

* update dataclasses requirement to setup

* fix: fixing spelling grammarr

* fix: fix typo spelling code commented and data_model.rst

* Fix Geo vis IP address & SQL logic bugs. (#383)

Fix Geo vis IP address & SQL logic bugs (issue [352](https://github.com/microsoft/maro/issues/352) and [314](https://github.com/microsoft/maro/issues/314)).

* Fix the "Wrong future stop tick predictions" bug (#386)

* Propose my new solution

Refine to the pre-process version

.

* Optimize import

* Fix reset random seed bug (#387)

* update the reset interface of Env and BE

* Try to fix reset routes generation seed issue

* Refine random related logics.

* Minor refinement

* Test check

* Minor

* Remove unused functions so far

* Minor

Co-authored-by: Jinyu Wang <jinywan@microsoft.com>

* update package version

* Add _init_vessel_plans in business_engine.reset (#388)

* update package version

* change the default solver used in Citibike OnlineLP example, from GLPK to CBC (#391)

Co-authored-by: Jinyu Wang <Wang.Jinyu@microsoft.com>

* Refine `event_buffer/` module (#389)

* Core & Business Engine code refinement (#392)

* First version

* Optimize imports

* Add typehint

* Lint check

* Lint check

* add higher python version (#398)

* add higher python version

* update pytorch version

* update torchvision version

Co-authored-by: Jinyu Wang <Wang.Jinyu@microsoft.com>

* CIM scenario refinement (#400)

* Cim scenario refinement (#394)

* CIM refinement

* Fix lint error

* Fix lint error

* Cim test coverage (#395)

* Enrich tests

* Refactor CimDataGenerator

* Refactor CIM parsers

* Minor refinement

* Fix lint error

* Fix lint error

* Fix lint error

* Minor refactor

* Type

* Add two test file folders. Make a slight change to CIM BE.

* Lint error

* Lint error

* Remove unnecessary public interfaces of CIM BE

* Cim disable auto action type detection (#399)

* Haven't been tested

* Modify document

* Add ActionType checking

* Minor

* Lint error

* Action quantity should be a position number

* Modify related docs & notebooks

* Minor

* Change test file name. Prepare to merge into master.

* .

* Minor test patch

* Add `clear()` function to class `SimRandom` (#401)

* Add SimRandom.clear()

* Minor

* Remove commented codes

* Lint error

* update package version

* add branch v0.3 to github workflow

* update github test workflow

* Update requirements.dev.txt (#444)

Added the versions of dependencies and resolve some conflicts occurs when installing. By adding these version number it will tell you the exact.

* Bump ipython from 7.10.1 to 7.16.3 in /notebooks (#460)

Bumps [ipython](https://github.com/ipython/ipython) from 7.10.1 to 7.16.3.
- [Release notes](https://github.com/ipython/ipython/releases)
- [Commits](https://github.com/ipython/ipython/compare/7.10.1...7.16.3)

---
updated-dependencies:
- dependency-name: ipython
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <support@github.com>

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* update github woorkflow config

* MARO v0.3: a new design of RL Toolkit, CLI refactorization, and corresponding updates. (#539)

* refined proxy coding style

* updated images and refined doc

* updated images

* updated CIM-AC example

* refined proxy retry logic

* call policy update only for AbsCorePolicy

* add limitation of AbsCorePolicy in Actor.collect()

* refined actor to return only experiences for policies that received new experiences

* fix MsgKey issue in rollout_manager

* fix typo in learner

* call exit function for parallel rollout manager

* update supply chain example distributed training scripts

* 1. moved exploration scheduling to rollout manager; 2. fixed bug in lr schedule registration in core model; 3. added parallel policy manager prorotype

* reformat render

* fix supply chain business engine action type problem

* reset supply chain example render figsize from 4 to 3

* Add render to all modes of supply chain example

* fix or policy typos

* 1. added parallel policy manager prototype; 2. used training ep for evaluation episodes

* refined parallel policy manager

* updated rl/__init__/py

* fixed lint issues and CIM local learner bugs

* deleted unwanted supply_chain test files

* revised default config for cim-dqn

* removed test_store.py as it is no longer needed

* 1. changed Actor class to rollout_worker function; 2. renamed algorithm to algorithms

* updated figures

* removed unwanted import

* refactored CIM-DQN example

* added MultiProcessRolloutManager and MultiProcessTrainingManager

* updated doc

* lint issue fix

* lint issue fix

* fixed import formatting

* [Feature] Prioritized Experience Replay (#355)

* added prioritized experience replay

* deleted unwanted supply_chain test files

* fixed import order

* import fix

* fixed lint issues

* fixed import formatting

* added note in docstring that rank-based PER has yet to be implemented

Co-authored-by: ysqyang <v-yangqi@microsoft.com>

* rm AbsDecisionGenerator

* small fixes

* bug fix

* reorganized training folder structure

* fixed lint issues

* fixed lint issues

* policy manager refined

* lint fix

* restructured CIM-dqn sync code

* added policy version index and used it as a measure of experience staleness

* lint issue fix

* lint issue fix

* switched log_dir and proxy_kwargs order

* cim example refinement

* eval schedule sorted only when it's a list

* eval schedule sorted only when it's a list

* update sc env wrapper

* added docker scripts for cim-dqn

* refactored example folder structure and added workflow templates

* fixed lint issues

* fixed lint issues

* fixed template bugs

* removed unused imports

* refactoring sc in progress

* simplified cim meta

* fixed build.sh path bug

* template refinement

* deleted obsolete svgs

* updated learner logs

* minor edits

* refactored templates for easy merge with async PR

* added component names for rollout manager and policy manager

* fixed incorrect position to add last episode to eval schedule

* added max_lag option in templates

* formatting edit in docker_compose_yml script

* moved local learner and early stopper outside sync_tools

* refactored rl toolkit folder structure

* refactored rl toolkit folder structure

* moved env_wrapper and agent_wrapper inside rl/learner

* refined scripts

* fixed typo in script

* changes needed for running sc

* removed unwanted imports

* config change for testing sc scenario

* changes for perf testing

* Asynchronous Training (#364)

* remote inference code draft

* changed actor to rollout_worker and updated init files

* removed unwanted import

* updated inits

* more async code

* added async scripts

* added async training code & scripts for CIM-dqn

* changed async to async_tools to avoid conflict with python keyword

* reverted unwanted change to dockerfile

* added doc for policy server

* addressed PR comments and fixed a bug in docker_compose_yml.py

* fixed lint issue

* resolved PR comment

* resolved merge conflicts

* added async templates

* added proxy.close() for actor and policy_server

* fixed incorrect position to add last episode to eval schedule

* reverted unwanted changes

* added missing async files

* rm unwanted echo in kill.sh

Co-authored-by: ysqyang <v-yangqi@microsoft.com>

* renamed sync to synchronous and async to asynchronous to avoid conflict with keyword

* added missing policy version increment in LocalPolicyManager

* refined rollout manager recv logic

* removed a debugging print

* added sleep in distributed launcher to avoid hanging

* updated api doc and rl toolkit doc

* refined dynamic imports using importlib

* 1. moved policy update triggers to policy manager; 2. added version control in policy manager

* fixed a few bugs and updated cim RL example

* fixed a few more bugs

* added agent wrapper instantiation to workflows

* added agent wrapper instantiation to workflows

* removed abs_block and added max_prob option for DiscretePolicyNet and DiscreteACNet

* fixed incorrect get_ac_policy signature for CIM

* moved exploration inside core policy

* added state to exploration call to support context-dependent exploration

* separated non_rl_policy_index and rl_policy_index in workflows

* modified sc example code according to workflow changes

* modified sc example code according to workflow changes

* added replay_agent_ids parameter to get_env_func for RL examples

* fixed a few bugs

* added maro/simulator/scenarios/supply_chain as bind mount

* added post-step, post-collect, post-eval and post-update callbacks

* fixed lint issues

* fixed lint issues

* moved instantiation of policy manager inside simple learner

* fixed env_wrapper get_reward signature

* minor edits

* removed get_eperience kwargs from env_wrapper

* 1. renamed step_callback to post_step in env_wrapper; 2. added get_eval_env_func to RL workflows

* added rollout exp disribution option in RL examples

* removed unwanted files

* 1. made logger internal in learner; 2 removed logger creation in abs classes

* checked out supply chain test files from v0.2_sc

* 1. added missing model.eval() to choose_action; 2.added entropy features to AC

* fixed a bug in ac entropy

* abbreviated coefficient to coeff

* removed -dqn from job name in rl example config

* added tmp patch to dev.df

* renamed image name for running rl examples

* added get_loss interface for core policies

* added policy manager in rl_toolkit.rst

* 1. env_wrapper bug fix; 2. policy manager update logic refinement

* refactored policy and algorithms

* policy interface redesigned

* refined policy interfaces

* fixed typo

* fixed bugs in refactored policy interface

* fixed some bugs

* refactoring in progress

* policy interface and policy manager redesigned

* 1. fixed bugs in ac and pg; 2. fixed bugs rl workflow scripts

* fixed bug in distributed policy manager

* fixed lint issues

* fixed lint issues

* added scipy in setup

* 1. trimmed rollout manager code; 2. added option to docker scripts

* updated api doc for policy manager

* 1. simplified rl/learning code structure; 2. fixed bugs in rl example docker script

* 1. simplified rl example structure; 2. fixed lint issues

* further rl toolkit code simplifications

* more numpy-based optimization in RL toolkit

* moved replay buffer inside policy

* bug fixes

* numpy optimization and associated refactoring

* extracted shaping logic out of env_sampler

* fixed bug in CIM shaping and lint issues

* preliminary implemetation of parallel batch inference

* fixed bug in ddpg transition recording

* put get_state, get_env_actions, get_reward back in EnvSampler

* simplified exploration and core model interfaces

* bug fixes and doc update

* added improve() interface for RLPolicy for single-thread support

* fixed simple policy manager bug

* updated doc, rst, notebook

* updated notebook

* fixed lint issues

* fixed entropy bugs in ac.py

* reverted to simple policy manager as default

* 1. unified single-thread and distributed mode in learning_loop.py; 2. updated api doc for algorithms and rst for rl toolkit

* fixed lint issues and updated rl toolkit images

* removed obsolete images

* added back agent2policy for general workflow use

* V0.2 rl refinement dist (#377)

* Support `slice` operation in ExperienceSet

* Support naive distributed policy training by proxy

* Dynamically allocate trainers according to number of experience

* code check

* code check

* code check

* Fix a bug in distributed trianing with no gradient

* Code check

* Move Back-Propagation from trainer to policy_manager and extract trainer-allocation strategy

* 1.call allocate_trainer() at first of update(); 2.refine according to code review

* Code check

* Refine code with new interface

* Update docs of PolicyManger and ExperienceSet

* Add images for rl_toolkit docs

* Update diagram of PolicyManager

* Refine with new interface

* Extract allocation strategy into `allocation_strategy.py`

* add `distributed_learn()` in policies for data-parallel training

* Update doc of RL_toolkit

* Add gradient workers for data-parallel

* Refine code and update docs

* Lint check

* Refine by comments

* Rename `trainer` to `worker`

* Rename `distributed_learn` to `learn_with_data_parallel`

* Refine allocator and remove redundant code in policy_manager

* remove arugments in allocate_by_policy and so on

* added checkpointing for simple and multi-process policy managers

* 1. bug fixes in checkpointing; 2. removed version and max_lag in rollout manager

* added missing set_state and get_state for CIM policies

* removed blank line

* updated RL workflow README

* Integrate `data_parallel` arguments into `worker_allocator` (#402)

* 1. simplified workflow config; 2. added comments to CIM shaping

* lint issue fix

* 1. added algorithm type setting in CIM config; 2. added try-except clause for initial policy state loading

* 1. moved post_step callback inside env sampler; 2. updated README for rl workflows

* refined READEME for CIM

* VM scheduling with RL (#375)

* added part of vm scheduling RL code

* refined vm env_wrapper code style

* added DQN

* added get_experiences func for ac in vm scheduling

* added post_step callback to env wrapper

* moved Aiming's tracking and plotting logic into callbacks

* added eval env wrapper

* renamed AC config variable name for VM

* vm scheduling RL code finished

* updated README

* fixed various bugs and hard coding for vm_scheduling

* uncommented callbacks for VM scheduling

* Minor revision for better code style

* added part of vm scheduling RL code

* refined vm env_wrapper code style

* vm scheduling RL code finished

* added config.py for vm scheduing

* vm example refactoring

* fixed bugs in vm_scheduling

* removed unwanted files from cim dir

* reverted to simple policy manager as default

* added part of vm scheduling RL code

* refined vm env_wrapper code style

* vm scheduling RL code finished

* added config.py for vm scheduing

* resolved rebase conflicts

* fixed bugs in vm_scheduling

* added get_state and set_state to vm_scheduling policy models

* updated README for vm_scheduling with RL

Co-authored-by: yaqiu <v-yaqiu@microsoft.com>
Co-authored-by: Huoran Li <huoranli@microsoft.com>

* SC refinement (#397)

* Refine test scripts & pending_order_daily logic

* Refactor code for better code style: complete type hint, correct typos, remove unused items.

Refactor code for better code style: complete type hint, correct typos, remove unused items.

* Polish test_supply_chain.py

* update import format

* Modify vehicle steps logic & remove outdated test case

* Optimize imports

* Optimize imports

* Lint error

* Lint error

* Lint error

* Add SupplyChainAction

* Lint error

Co-authored-by: Jinyu Wang <Wang.Jinyu@microsoft.com>

* refined workflow scripts

* fixed bug in ParallelAgentWrapper

* 1. fixed lint issues; 2. refined main script in workflows

* lint issue fix

* restored default config for rl example

* Update rollout.py

* refined env var processing in policy manager workflow

* added hasattr check in agent wrapper

* updated docker_compose_yml.py

* Minor refinement

* Minor PR. Prepare to merge latest master branch into v0.3 branch. (#412)

* Prepare to merge master_mirror

* Lint error

* Minor

* Merge latest master into v0.3 (#426)

* update docker hub init (#367)

* update docker hub init

* replace personal account with maro-team

* update hello files for CIM

* update docker repository name

* update docker file name

* fix bugs in notebook, rectify docs

* fix doc build issue

* remove docs from playground; fix citibike lp example Event issue

* update the exampel for vector env

* update vector env example

* update README due to PR comments

* add link to playground above MARO installation in README

* fix some typos

Co-authored-by: Jinyu Wang <Wang.Jinyu@microsoft.com>

* update package version

* update README for package description

* update image links for pypi package description

* update image links for pypi package description

* change the input topology schema for CIM real data mode (#372)

* change the input topology schema for CIM real data mode

* remove unused importing

* update test config file correspondingly

* add Exception for env test

* add cost factors to cim data dump

* update CimDataCollection field name

* update field name of data collection related code

* update package version

* adjust interface to reflect actual signature (#374)

Co-authored-by: Jeremy Reynolds <jeremr@microsoft.com>

* update dataclasses requirement to setup

* fix: fixing spelling grammarr

* fix: fix typo spelling code commented and data_model.rst

* Fix Geo vis IP address & SQL logic bugs. (#383)

Fix Geo vis IP address & SQL logic bugs (issue [352](https://github.com/microsoft/maro/issues/352) and [314](https://github.com/microsoft/maro/issues/314)).

* Fix the "Wrong future stop tick predictions" bug (#386)

* Propose my new solution

Refine to the pre-process version

.

* Optimize import

* Fix reset random seed bug (#387)

* update the reset interface of Env and BE

* Try to fix reset routes generation seed issue

* Refine random related logics.

* Minor refinement

* Test check

* Minor

* Remove unused functions so far

* Minor

Co-authored-by: Jinyu Wang <jinywan@microsoft.com>

* update package version

* Add _init_vessel_plans in business_engine.reset (#388)

* update package version

* change the default solver used in Citibike OnlineLP example, from GLPK to CBC (#391)

Co-authored-by: Jinyu Wang <Wang.Jinyu@microsoft.com>

* Refine `event_buffer/` module (#389)

* Core & Business Engine code refinement (#392)

* First version

* Optimize imports

* Add typehint

* Lint check

* Lint check

* add higher python version (#398)

* add higher python version

* update pytorch version

* update torchvision version

Co-authored-by: Jinyu Wang <Wang.Jinyu@microsoft.com>

* CIM scenario refinement (#400)

* Cim scenario refinement (#394)

* CIM refinement

* Fix lint error

* Fix lint error

* Cim test coverage (#395)

* Enrich tests

* Refactor CimDataGenerator

* Refactor CIM parsers

* Minor refinement

* Fix lint error

* Fix lint error

* Fix lint error

* Minor refactor

* Type

* Add two test file folders. Make a slight change to CIM BE.

* Lint error

* Lint error

* Remove unnecessary public interfaces of CIM BE

* Cim disable auto action type detection (#399)

* Haven't been tested

* Modify document

* Add ActionType checking

* Minor

* Lint error

* Action quantity should be a position number

* Modify related docs & notebooks

* Minor

* Change test file name. Prepare to merge into master.

* .

* Minor test patch

* Add `clear()` function to class `SimRandom` (#401)

* Add SimRandom.clear()

* Minor

* Remove commented codes

* Lint error

* update package version

* Minor

* Remove docs/source/examples/multi_agent_dqn_cim.rst

* Update .gitignore

* Update .gitignore

Co-authored-by: Jinyu-W <53509467+Jinyu-W@users.noreply.github.com>
Co-authored-by: Jinyu Wang <Wang.Jinyu@microsoft.com>
Co-authored-by: Jinyu Wang <jinywan@microsoft.com>
Co-authored-by: Jeremy Reynolds <jeremy.reynolds@microsoft.com>
Co-authored-by: Jeremy Reynolds <jeremr@microsoft.com>
Co-authored-by: slowy07 <slowy.arfy@gmail.com>

* Change `Env.set_seed()` logic (#456)

* Change Env.set_seed() logic

* Redesign CIM reset logic; fix lint issues;

* Lint

* Seed type assertion

* Remove all SC related files (#473)

* RL Toolkit V3 (#471)

* added daemon=True for multi-process rollout, policy manager and inference

* removed obsolete files

* [REDO][PR#406]V0.2 rl refinement taskq (#408)

* Add a usable task_queue

* Rename some variables

* 1. Add ; 2. Integrate  related files; 3. Remove

* merge `data_parallel` and `num_grad_workers` into `data_parallelism`

* Fix bugs in docker_compose_yml.py and Simple/Multi-process mode.

* Move `grad_worker` into marl/rl/workflows

* 1.Merge data_parallel and num_workers into data_parallelism in config; 2.Assign recently used workers as possible in task_queue.

* Refine code and update docs of `TaskQueue`

* Support priority for tasks in `task_queue`

* Update diagram of policy manager and task queue.

* Add configurable `single_task_limit` and correct docstring about `data_parallelism`

* Fix lint errors in `supply chain`

* RL policy redesign (V2) (#405)

* Drafi v2.0 for V2

* Polish models with more comments

* Polish policies with more comments

* Lint

* Lint

* Add developer doc for models.

* Add developer doc for policies.

* Remove policy manager V2 since it is not used and out-of-date

* Lint

* Lint

* refined messy workflow code

* merged 'scenario_dir' and 'scenario' in rl config

* 1. refined env_sampler and agent_wrapper code; 2. added docstrings for env_sampler methods

* 1. temporarily renamed RLPolicy from polivy_v2 to RLPolicyV2; 2. merged env_sampler and env_sampler_v2

* merged cim and cim_v2

* lint issue fix

* refined logging logic

* lint issue fix

* reversed unwanted changes

* .

.

.

.

ReplayMemory & IndexScheduler

ReplayMemory & IndexScheduler

.

MultiReplayMemory

get_actions_with_logps

EnvSampler on the road

EnvSampler

Minor

* LearnerManager

* Use batch to transfer data & add SHAPE_CHECK_FLAG

* Rename learner to trainer

* Add property for policy._is_exploring

* CIM test scenario for V3. Manual test passed. Next step: run it, make it works.

* env_sampler.py could run

* env_sampler refine on the way

* First runnable version done

* AC could run, but the result is bad. Need to check the logic

* Refine abstract method & shape check error info.

* Docs

* Very detailed compare. Try again.

* AC done

* DQN check done

* Minor

* DDPG, not tested

* Minors

* A rough draft of MAAC

* Cannot use CIM as the multi-agent scenario.

* Minor

* MAAC refinement on the way

* Remove ActionWithAux

* Refine batch & memory

* MAAC example works

* Reproduce-able fix. Policy share between env_sampler and trainer_manager.

* Detail refinement

* Simplify the user configed workflow

* Minor

* Refine example codes

* Minor polishment

* Migrate rollout_manager to V3

* Error on the way

* Redesign torch.device management

* Rl v3 maddpg (#418)

* Add MADDPG trainer

* Fit independent critics and shared critic modes.

* Add a new property: num_policies

* Lint

* Fix a bug in `sum(rewards)`

* Rename `MADDPG` to `DiscreteMADDPG` and fix type hint.

* Rename maddpg in examples.

* Preparation for data parallel (#420)

* Preparation for data parallel

* Minor refinement & lint fix

* Lint

* Lint

* rename atomic_get_batch_grad to get_batch_grad

* Fix a unexpected commit

* distributed maddpg

* Add critic worker

* Minor

* Data parallel related minorities

* Refine code structure for trainers & add more doc strings

* Revert a unwanted change

* Use TrainWorker to do the actual calculations.

* Some minor redesign of the worker's abstraction

* Add set/get_policy_state_dict back

* Refine set/get_policy_state_dict

* Polish policy trainers

move train_batch_size to abs trainer
delete _train_step_impl()
remove _record_impl
remove unused methods
a minor bug fix in maddpg

* Rl v3 data parallel grad worker (#432)

* Fit new `trainer_worker` in `grad_worker` and `task_queue`.

* Add batch dispatch

* Add `tensor_dict` for task submit interface

* Move `_remote_learn` to `AbsTrainWorker`.

* Complement docstring for task queue and trainer.

* Rename train worker to train ops; add placeholder for abstract methods;

* Lint

Co-authored-by: GQ.Chen <v-guanchen@microsoft.com>

* [DRAFT] distributed training pipeline based on RL Toolkit V3 (#450)

* Preparation for data parallel

* Minor refinement & lint fix

* Lint

* Lint

* rename atomic_get_batch_grad to get_batch_grad

* Fix a unexpected commit

* distributed maddpg

* Add critic worker

* Minor

* Data parallel related minorities

* Refine code structure for trainers & add more doc strings

* Revert a unwanted change

* Use TrainWorker to do the actual calculations.

* Some minor redesign of the worker's abstraction

* Add set/get_policy_state_dict back

* Refine set/get_policy_state_dict

* Polish policy trainers

move train_batch_size to abs trainer
delete _train_step_impl()
remove _record_impl
remove unused methods
a minor bug fix in maddpg

* Rl v3 data parallel grad worker (#432)

* Fit new `trainer_worker` in `grad_worker` and `task_queue`.

* Add batch dispatch

* Add `tensor_dict` for task submit interface

* Move `_remote_learn` to `AbsTrainWorker`.

* Complement docstring for task queue and trainer.

* dsitributed training pipeline draft

* added temporary test files for review purposes

* Several code style refinements (#451)

* Polish rl_v3/utils/

* Polish rl_v3/distributed/

* Polish rl_v3/policy_trainer/abs_trainer.py

* fixed merge conflicts

* unified sync and async interfaces

* refactored rl_v3; refinement in progress

* Finish the runnable pipeline under new design

* Remove outdated files; refine class names; optimize imports;

* Lint

* Minor maddpg related refinement

* Lint

Co-authored-by: Default <huo53926@126.com>
Co-authored-by: Huoran Li <huoranli@microsoft.com>
Co-authored-by: GQ.Chen <v-guanchen@microsoft.com>
Co-authored-by: ysqyang <v-yangqi@microsoft.com>

* Miner bug fix

* Coroutine-related bug fix ("get_policy_state") (#452)

* fixed rebase conflicts

* renamed get_policy_func_dict to policy_creator

* deleted unwanted folder

* removed unwanted changes

* resolved PR452 comments

Co-authored-by: ysqyang <v-yangqi@microsoft.com>

* Quick fix

* Redesign experience recording logic (#453)

* Two not important fix

* Temp draft. Prepare to WFH

* Done

* Lint

* Lint

* Calculating advantages / returns (#454)

* V1.0

* Complete DDPG

* Rl v3 hanging issue fix (#455)

* fixed rebase conflicts

* renamed get_policy_func_dict to policy_creator

* unified worker interfaces

* recovered some files

* dist training …
  • Loading branch information
16 people authored Mar 30, 2023
1 parent 94548c7 commit ef2a358
Show file tree
Hide file tree
Showing 76 changed files with 2,223 additions and 550 deletions.
2 changes: 2 additions & 0 deletions examples/cim/rl/algorithms/ac.py
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,7 @@
actor_net_conf = {
"hidden_dims": [256, 128, 64],
"activation": torch.nn.Tanh,
"output_activation": torch.nn.Tanh,
"softmax": True,
"batch_norm": False,
"head": True,
Expand All @@ -19,6 +20,7 @@
"hidden_dims": [256, 128, 64],
"output_dim": 1,
"activation": torch.nn.LeakyReLU,
"output_activation": torch.nn.LeakyReLU,
"softmax": False,
"batch_norm": True,
"head": True,
Expand Down
1 change: 1 addition & 0 deletions examples/cim/rl/algorithms/dqn.py
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,7 @@
q_net_conf = {
"hidden_dims": [256, 128, 64, 32],
"activation": torch.nn.LeakyReLU,
"output_activation": torch.nn.LeakyReLU,
"softmax": False,
"batch_norm": True,
"skip_connection": False,
Expand Down
2 changes: 2 additions & 0 deletions examples/cim/rl/algorithms/maddpg.py
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,7 @@
actor_net_conf = {
"hidden_dims": [256, 128, 64],
"activation": torch.nn.Tanh,
"output_activation": torch.nn.Tanh,
"softmax": True,
"batch_norm": False,
"head": True,
Expand All @@ -22,6 +23,7 @@
"hidden_dims": [256, 128, 64],
"output_dim": 1,
"activation": torch.nn.LeakyReLU,
"output_activation": torch.nn.LeakyReLU,
"softmax": False,
"batch_norm": True,
"head": True,
Expand Down
26 changes: 20 additions & 6 deletions examples/cim/rl/env_sampler.py
Original file line number Diff line number Diff line change
Expand Up @@ -90,11 +90,25 @@ def post_collect(self, info_list: list, ep: int) -> None:
for info in info_list:
print(f"env summary (episode {ep}): {info['env_metric']}")

# print the average env metric
if len(info_list) > 1:
metric_keys, num_envs = info_list[0]["env_metric"].keys(), len(info_list)
avg_metric = {key: sum(info["env_metric"][key] for info in info_list) / num_envs for key in metric_keys}
print(f"average env summary (episode {ep}): {avg_metric}")
# average env metric
metric_keys, num_envs = info_list[0]["env_metric"].keys(), len(info_list)
avg_metric = {key: sum(info["env_metric"][key] for info in info_list) / num_envs for key in metric_keys}
print(f"average env summary (episode {ep}): {avg_metric}")

self.metrics.update(avg_metric)
self.metrics = {k: v for k, v in self.metrics.items() if not k.startswith("val/")}

def post_evaluate(self, info_list: list, ep: int) -> None:
self.post_collect(info_list, ep)
# print the env metric from each rollout worker
for info in info_list:
print(f"env summary (episode {ep}): {info['env_metric']}")

# average env metric
metric_keys, num_envs = info_list[0]["env_metric"].keys(), len(info_list)
avg_metric = {key: sum(info["env_metric"][key] for info in info_list) / num_envs for key in metric_keys}
print(f"average env summary (episode {ep}): {avg_metric}")

self.metrics.update({"val/" + k: v for k, v in avg_metric.items()})

def monitor_metrics(self) -> float:
return -self.metrics["val/container_shortage"]
2 changes: 1 addition & 1 deletion examples/cim/rl/rl_component_bundle.py
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,7 @@

# Environments
learn_env = Env(**env_conf)
test_env = learn_env
test_env = Env(**env_conf)

# Agent, policy, and trainers
num_agents = len(learn_env.agent_idx_list)
Expand Down
2 changes: 1 addition & 1 deletion examples/rl/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@ This folder contains scenarios that employ reinforcement learning. MARO's RL too
The entrance of a RL workflow is a YAML config file. For readers' convenience, we call this config file `config.yml` in the rest part of this doc. `config.yml` specifies the path of all necessary resources, definitions, and configurations to run the job. MARO provides a comprehensive template of the config file with detailed explanations (`maro/maro/rl/workflows/config/template.yml`). Meanwhile, MARO also provides several simple examples of `config.yml` under the current folder.

There are two ways to start the RL job:
- If you only need to have a quick look and try to start an out-of-box workflow, just run `python .\examples\rl\run_rl_example.py PATH_TO_CONFIG_YAML`. For example, `python .\examples\rl\run_rl_example.py .\examples\rl\cim.yml` will run the complete example RL training workflow of CIM scenario. If you only want to run the evaluation workflow, you could start the job with `--evaluate_only`.
- If you only need to have a quick look and try to start an out-of-box workflow, just run `python .\examples\rl\run.py PATH_TO_CONFIG_YAML`. For example, `python .\examples\rl\run.py .\examples\rl\cim.yml` will run the complete example RL training workflow of CIM scenario. If you only want to run the evaluation workflow, you could start the job with `--evaluate_only`.
- (**Require install MARO from source**) You could also start the job through MARO CLI. Use the command `maro local run [-c] path/to/your/config` to run in containerized (with `-c`) or non-containerized (without `-c`) environments. Similar, you could add `--evaluate_only` if you only need to run the evaluation workflow.

## Create Your Own Scenarios
Expand Down
9 changes: 5 additions & 4 deletions examples/rl/cim.yml
Original file line number Diff line number Diff line change
Expand Up @@ -5,16 +5,17 @@
# Please refer to `maro/rl/workflows/config/template.yml` for the complete template and detailed explanations.

# Run this workflow by executing one of the following commands:
# - python .\examples\rl\run_rl_example.py .\examples\rl\cim.yml
# - (Requires installing MARO from source) maro local run .\examples\rl\cim.yml
# - python ./examples/rl/run.py ./examples/rl/cim.yml
# - (Requires installing MARO from source) maro local run ./examples/rl/cim.yml

job: cim_rl_workflow
scenario_path: "examples/cim/rl"
log_path: "log/rl_job/cim.txt"
log_path: "log/cim_rl/"
main:
num_episodes: 30 # Number of episodes to run. Each episode is one cycle of roll-out and training.
num_steps: null
eval_schedule: 5
early_stop_patience: 5
logging:
stdout: INFO
file: DEBUG
Expand All @@ -27,7 +28,7 @@ training:
load_path: null
load_episode: null
checkpointing:
path: "checkpoint/rl_job/cim"
path: "log/cim_rl/checkpoints"
interval: 5
logging:
stdout: INFO
Expand Down
10 changes: 5 additions & 5 deletions examples/rl/cim_distributed.yml
Original file line number Diff line number Diff line change
@@ -1,16 +1,16 @@
# Copyright (c) Microsoft Corporation.
# Licensed under the MIT license.

# Example RL config file for CIM scenario.
# Example RL config file for CIM scenario (distributed version).
# Please refer to `maro/rl/workflows/config/template.yml` for the complete template and detailed explanations.

# Run this workflow by executing one of the following commands:
# - python .\examples\rl\run_rl_example.py .\examples\rl\cim.yml
# - (Requires installing MARO from source) maro local run .\examples\rl\cim.yml
# - python ./examples/rl/run.py ./examples/rl/cim_distributed.yml
# - (Requires installing MARO from source) maro local run ./examples/rl/cim_distributed.yml

job: cim_rl_workflow
scenario_path: "examples/cim/rl"
log_path: "log/rl_job/cim.txt"
log_path: "log/cim_rl/"
main:
num_episodes: 30 # Number of episodes to run. Each episode is one cycle of roll-out and training.
num_steps: null
Expand All @@ -35,7 +35,7 @@ training:
load_path: null
load_episode: null
checkpointing:
path: "checkpoint/rl_job/cim"
path: "log/cim_rl/checkpoints"
interval: 5
proxy:
host: "127.0.0.1"
Expand Down
File renamed without changes.
8 changes: 4 additions & 4 deletions examples/rl/vm_scheduling.yml
Original file line number Diff line number Diff line change
Expand Up @@ -5,12 +5,12 @@
# Please refer to `maro/rl/workflows/config/template.yml` for the complete template and detailed explanations.

# Run this workflow by executing one of the following commands:
# - python .\examples\rl\run_rl_example.py .\examples\rl\vm_scheduling.yml
# - (Requires installing MARO from source) maro local run .\examples\rl\vm_scheduling.yml
# - python ./examples/rl/run.py ./examples/rl/vm_scheduling.yml
# - (Requires installing MARO from source) maro local run ./examples/rl/vm_scheduling.yml

job: vm_scheduling_rl_workflow
scenario_path: "examples/vm_scheduling/rl"
log_path: "log/rl_job/vm_scheduling.txt"
log_path: "log/vm_rl/"
main:
num_episodes: 30 # Number of episodes to run. Each episode is one cycle of roll-out and training.
num_steps: null
Expand All @@ -27,7 +27,7 @@ training:
load_path: null
load_episode: null
checkpointing:
path: "checkpoint/rl_job/vm_scheduling"
path: "log/vm_rl/checkpoints"
interval: 5
logging:
stdout: INFO
Expand Down
2 changes: 2 additions & 0 deletions examples/vm_scheduling/rl/algorithms/ac.py
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,7 @@
actor_net_conf = {
"hidden_dims": [64, 32, 32],
"activation": torch.nn.LeakyReLU,
"output_activation": torch.nn.LeakyReLU,
"softmax": True,
"batch_norm": False,
"head": True,
Expand All @@ -19,6 +20,7 @@
critic_net_conf = {
"hidden_dims": [256, 128, 64],
"activation": torch.nn.LeakyReLU,
"output_activation": torch.nn.LeakyReLU,
"softmax": False,
"batch_norm": False,
"head": True,
Expand Down
1 change: 1 addition & 0 deletions examples/vm_scheduling/rl/algorithms/dqn.py
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,7 @@
q_net_conf = {
"hidden_dims": [64, 128, 256],
"activation": torch.nn.LeakyReLU,
"output_activation": torch.nn.LeakyReLU,
"softmax": False,
"batch_norm": False,
"skip_connection": False,
Expand Down
2 changes: 1 addition & 1 deletion maro/__misc__.py
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,6 @@
# Licensed under the MIT license.


__version__ = "0.3.1a2"
__version__ = "0.3.2a1"

__data_version__ = "0.2"
5 changes: 2 additions & 3 deletions maro/cli/data_pipeline/citi_bike.py
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,6 @@
from enum import Enum

import geopy.distance
import numpy as np
import pandas as pd
from yaml import safe_load

Expand Down Expand Up @@ -320,7 +319,7 @@ def _process_distance(self, station_info: pd.DataFrame):
0,
index=station_info["station_index"],
columns=station_info["station_index"],
dtype=np.float,
dtype=float,
)
look_up_df = station_info[["latitude", "longitude"]]
return distance_adj.apply(
Expand Down Expand Up @@ -617,7 +616,7 @@ def _gen_distance(self, station_init: pd.DataFrame):
0,
index=station_init["station_index"],
columns=station_init["station_index"],
dtype=np.float,
dtype=float,
)
look_up_df = station_init[["latitude", "longitude"]]
distance_df = distance_adj.apply(
Expand Down
5 changes: 3 additions & 2 deletions maro/cli/local/commands.py
Original file line number Diff line number Diff line change
Expand Up @@ -61,7 +61,7 @@ def get_redis_conn(port=None):


# Functions executed on CLI commands
def run(conf_path: str, containerize: bool = False, evaluate_only: bool = False, **kwargs):
def run(conf_path: str, containerize: bool = False, seed: int = None, evaluate_only: bool = False, **kwargs):
# Load job configuration file
parser = ConfigParser(conf_path)
if containerize:
Expand All @@ -71,13 +71,14 @@ def run(conf_path: str, containerize: bool = False, evaluate_only: bool = False,
LOCAL_MARO_ROOT,
DOCKERFILE_PATH,
DOCKER_IMAGE_NAME,
seed=seed,
evaluate_only=evaluate_only,
)
except KeyboardInterrupt:
stop_rl_job_with_docker_compose(parser.config["job"], LOCAL_MARO_ROOT)
else:
try:
start_rl_job(parser, LOCAL_MARO_ROOT, evaluate_only=evaluate_only)
start_rl_job(parser, LOCAL_MARO_ROOT, seed=seed, evaluate_only=evaluate_only)
except KeyboardInterrupt:
sys.exit(1)

Expand Down
12 changes: 9 additions & 3 deletions maro/cli/local/utils.py
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@
import os
import subprocess
from copy import deepcopy
from typing import List
from typing import List, Optional

import docker
import yaml
Expand Down Expand Up @@ -110,12 +110,15 @@ def exec(cmd: str, env: dict, debug: bool = False) -> subprocess.Popen:
def start_rl_job(
parser: ConfigParser,
maro_root: str,
seed: Optional[int],
evaluate_only: bool,
background: bool = False,
) -> List[subprocess.Popen]:
procs = [
exec(
f"python {script}" + ("" if not evaluate_only else " --evaluate_only"),
f"python {script}"
+ ("" if not evaluate_only else " --evaluate_only")
+ ("" if seed is None else f" --seed {seed}"),
format_env_vars({**env, "PYTHONPATH": maro_root}, mode="proc"),
debug=not background,
)
Expand Down Expand Up @@ -169,6 +172,7 @@ def start_rl_job_with_docker_compose(
context: str,
dockerfile_path: str,
image_name: str,
seed: Optional[int],
evaluate_only: bool,
) -> None:
common_spec = {
Expand All @@ -185,7 +189,9 @@ def start_rl_job_with_docker_compose(
**deepcopy(common_spec),
**{
"container_name": component,
"command": f"python3 {script}" + ("" if not evaluate_only else " --evaluate_only"),
"command": f"python3 {script}"
+ ("" if not evaluate_only else " --evaluate_only")
+ ("" if seed is None else f" --seed {seed}"),
"environment": format_env_vars(env, mode="docker-compose"),
},
}
Expand Down
8 changes: 7 additions & 1 deletion maro/rl/model/abs_net.py
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@
from __future__ import annotations

from abc import ABCMeta
from typing import Any, Dict
from typing import Any, Dict, Optional

import torch.nn
from torch.optim import Optimizer
Expand All @@ -18,6 +18,8 @@ class AbsNet(torch.nn.Module, metaclass=ABCMeta):
def __init__(self) -> None:
super(AbsNet, self).__init__()

self._device: Optional[torch.device] = None

@property
def optim(self) -> Optimizer:
optim = getattr(self, "_optim", None)
Expand Down Expand Up @@ -119,3 +121,7 @@ def unfreeze_all_parameters(self) -> None:
"""Unfreeze all parameters."""
for p in self.parameters():
p.requires_grad = True

def to_device(self, device: torch.device) -> None:
self._device = device
self.to(device)
17 changes: 13 additions & 4 deletions maro/rl/model/algorithm_nets/ac_based.py
Original file line number Diff line number Diff line change
Expand Up @@ -43,14 +43,23 @@ class ContinuousACBasedNet(ContinuousPolicyNet, metaclass=ABCMeta):
- set_state(self, net_state: dict) -> None:
"""

def _get_actions_impl(self, states: torch.Tensor, exploring: bool) -> torch.Tensor:
actions, _ = self._get_actions_with_logps_impl(states, exploring)
def _get_actions_impl(self, states: torch.Tensor, exploring: bool, **kwargs) -> torch.Tensor:
actions, _ = self._get_actions_with_logps_impl(states, exploring, **kwargs)
return actions

def _get_actions_with_probs_impl(self, states: torch.Tensor, exploring: bool) -> Tuple[torch.Tensor, torch.Tensor]:
def _get_actions_with_probs_impl(
self,
states: torch.Tensor,
exploring: bool,
**kwargs,
) -> Tuple[torch.Tensor, torch.Tensor]:
# Not used in Actor-Critic or PPO
pass

def _get_states_actions_probs_impl(self, states: torch.Tensor, actions: torch.Tensor) -> torch.Tensor:
def _get_states_actions_probs_impl(self, states: torch.Tensor, actions: torch.Tensor, **kwargs) -> torch.Tensor:
# Not used in Actor-Critic or PPO
pass

def _get_random_actions_impl(self, states: torch.Tensor, **kwargs) -> torch.Tensor:
# Not used in Actor-Critic or PPO
pass
22 changes: 18 additions & 4 deletions maro/rl/model/algorithm_nets/ddpg.py
Original file line number Diff line number Diff line change
Expand Up @@ -25,18 +25,32 @@ class ContinuousDDPGNet(ContinuousPolicyNet, metaclass=ABCMeta):
- set_state(self, net_state: dict) -> None:
"""

def _get_actions_with_probs_impl(self, states: torch.Tensor, exploring: bool) -> Tuple[torch.Tensor, torch.Tensor]:
def _get_actions_with_probs_impl(
self,
states: torch.Tensor,
exploring: bool,
**kwargs,
) -> Tuple[torch.Tensor, torch.Tensor]:
# Not used in DDPG
pass

def _get_actions_with_logps_impl(self, states: torch.Tensor, exploring: bool) -> Tuple[torch.Tensor, torch.Tensor]:
def _get_actions_with_logps_impl(
self,
states: torch.Tensor,
exploring: bool,
**kwargs,
) -> Tuple[torch.Tensor, torch.Tensor]:
# Not used in DDPG
pass

def _get_states_actions_probs_impl(self, states: torch.Tensor, actions: torch.Tensor) -> torch.Tensor:
def _get_states_actions_probs_impl(self, states: torch.Tensor, actions: torch.Tensor, **kwargs) -> torch.Tensor:
# Not used in DDPG
pass

def _get_states_actions_logps_impl(self, states: torch.Tensor, actions: torch.Tensor) -> torch.Tensor:
def _get_states_actions_logps_impl(self, states: torch.Tensor, actions: torch.Tensor, **kwargs) -> torch.Tensor:
# Not used in DDPG
pass

def _get_random_actions_impl(self, states: torch.Tensor, **kwargs) -> torch.Tensor:
# Not used in DDPG
pass
Loading

0 comments on commit ef2a358

Please sign in to comment.