A framework for human-informed reinforcement learning by subjective logic
-
/01-experiment-setup - Input files for the experiment.
-
/02-maps - Map files:
.xlsx
-
/03-input - Input files: maps and human advice
-
/04-src - Source code
- Main
runner.py
- Main modulemodel.py
- Model classes
- Advice/SL modules
advice_parser.py
- Parses human input from/03-input
. Input file naming convention:advice-[SIZE]x[SIZE]-seed[SEED].txt
Format:
grid size [1] advice [*]
- Main
-
sl.py
- Subjective logic utilities -
Map module
map_tools.py
- Generator, renderer, and parser for maps. Saves maps under/02-maps
as.xslx
files.
-
/05-experiments-output - Experiment data as
.csv
files -
/06-analysis-output - Analysis of experiment data from
/05-experiments-output
as.pdf
files -
/tests - Unit tests.
- Clone this repository.
- Install requirements via
pip install -r requirements.txt
.
✏️ To replicate the experiment results as seen in the paper, follow the below steps with [SIZE] = 12
and [SEED] = 63
✏️
- Generate a map by running
python .\04-src\map_tools.py (--generate --render --size [SIZE] --seed [SEED]) | -default
-- Replace[SIZE]
and[SEED]
with the values (int) you need. The--render
flag is optional. When run with the-default
option, the default 4x4 map will be generated. The map files will be in the folder02-maps
after generation. - Create all twelve advice files in the
03-input
folder with the following name:advice-[SIZE]x[SIZE]-seed[SEED]-[QUOTA].txt
(e.g.,advice-6x6-seed10-all.txt
). Quota = {'all', 'holes', 'human10', 'human5', 'coop5-A1-topleft', 'coop5-A1-topright', 'coop5-A2-bottomleft', 'coop5-A2-bottomright', 'coop10-A1-topleft', 'coop10-A1-topright', 'coop10-A2-bottomleft', 'coop10-A2-bottomright'}- Synthetic advice file can be generated by running
python .\04-src\advice_tools.py --size [SIZE] --seed [SEED] -g [ALL|HOLES]
.ALL
will generate advice for all cells;HOLES
will generate advice for the holes and the goal. Other files must be generated manually.- Advice values for frozen tiles in
ALL
: +1 if no neighboring holes; 0 if one neighboring hole; -1 otherwise. ⚠️ Generated files will be in the folder02-maps
, and must be moved to the folder03-input
before the next step.⚠️
- Advice values for frozen tiles in
- Synthetic advice file can be generated by running
✏️ The files generated by the steps above for the experiments as seen in the paper are located in the folder 01-experiment-setup
. Copy the files from the 01-experiment-setup
folder to the 03-input
folder to skip the previous steps. ✏️
- Run the experiment using
python .\04-src\runner.py
.
- Mandatory parameter:
--mode [MODE]
-- The[MODE]
value is one of the following:random
,noadvice
,synthetic
,coop
.
- Optional parameters:
--log [LOG_LEVEL]
-- The[LOG_LEVEL]
value is one of the following:critical
,error
,warn
,warning
,info
,debug
.--name [STRING]
-- The name of the experiment based on which the top results folder will be named. If not provided, the folder is named as datetime.now() by formatted as "%Y%m%d-%H%M%S".
- Settings (size, seed, numexperiments, maxepisodes) can be set in
runner.__name__
. - Results will be generated into
/05-experiments-output
, under a timestamped folder, with the following folder structure:
- [maxepisodes]
- policy_data
- advice-coop5-topleft-bottomright
- One .csv file named after the map size and seed.
- advice-coop5-topright-bottomleft
- ...
- advice-coop10-topleft6-bottomright
- ...
- advice-coop10-topright-bottomleft
- ...
- advice-synthetic-all
- Multiple .csv files named after the map size, seed, and the _u_ parameter used in the specific experiment.
- advice-synthetic-holes
- ...
- advice-synthetic-human5
- ...
- advice-synthetic-human10
- ...
- noadvice
- One .csv file named after the map size and seed.
- random
- ...
- reward_data
- ...
- Run
python .\04-src\analysis.py -a [METHOD_NAME] -s [True|False] -log [LOG_LEVEL]
. - Optional parameters:
-a [METHOD_NAME]
-- The[METHOD_NAME]
value is one of the following:cumulative_reward
,heatmap
.-s [True|False]
-- Stash folder results--log [LOG_LEVEL]
-- The[LOG_LEVEL]
value is one of the following:critical
,error
,warn
,warning
,info
,debug
.
- Results will be generated into
/06-analysis-output