GitHub - Maryia-M/episodic-curiosity: Tensorflow/Keras code and trained models for Episodic Curiosity Through Reachability. My updates to run it with Python 3.6 (original version works only with Python2.7).

Episodic Curiosity Through Reachability

In ICLR 2019 [Project Website][Paper]

This is my updates to run it with Python3.6. To install deepmind_lab you can use my prebuilt wheel instead of building it as in original instruction below.

Nikolay Savinov¹, Anton Raichuk², Raphaël Marinier², Damien Vincent², Marc Pollefeys¹, Timothy Lillicrap³, Sylvain Gelly²
¹ETH Zurich, ²Google AI, ³DeepMind

Navigation out of curiosity	Locomotion out of curiosity

This is an implementation of our ICLR 2019 Episodic Curiosity Through Reachability. If you use this work, please cite:

@inproceedings{Savinov2019_EC,
    Author = {Savinov, Nikolay and Raichuk, Anton and Marinier, Rapha{\"e}l and Vincent, Damien and Pollefeys, Marc and Lillicrap, Timothy and Gelly, Sylvain},
    Title = {Episodic Curiosity through Reachability},
    Booktitle = {International Conference on Learning Representations ({ICLR})},
    Year = {2019}
}

Requirements

The code was tested on Linux only. The code assumes that the command "python" invokes python 2.7. We recommend you use virtualenv:

sudo apt-get install python-pip
pip install virtualenv
python -m virtualenv episodic_curiosity_env
source episodic_curiosity_env/bin/activate

Installation

Clone this repository:

git clone https://github.com/google-research/episodic-curiosity.git
cd episodic-curiosity

We require a modified version of DeepMind lab:

Clone DeepMind Lab:

git clone https://github.com/deepmind/lab
cd lab

Apply our patch to DeepMind Lab:

git checkout 7b851dcbf6171fa184bf8a25bf2c87fe6d3f5380
git checkout -b modified_dmlab
git apply ../third_party/dmlab/dmlab_min_goal_distance.patch

Install DMLab as a PIP module by following these instructions

In a nutshell, once you've installed DMLab dependencies, you need to run:

bazel build -c opt python/pip_package:build_pip_package
./bazel-bin/python/pip_package/build_pip_package /tmp/dmlab_pkg
pip install /tmp/dmlab_pkg/DeepMind_Lab-1.0-py2-none-any.whl --force-reinstall

If you wish to run Mujoco experiments (section S1 of the paper), you need to install dm_control and its dependencies. See this documentation, and replace pip install -e . by pip install -e .[mujoco] in the command below.

Finally, install episodic curiosity and its pip dependencies:

cd episodic-curiosity
pip install -e .

Resource requirements for training

Environment	Training method	Required GPU	Recommended RAM
DMLab	PPO	No	32GBs
DMLab	PPO + Grid Oracle	No	32GBs
DMLab	PPO + EC using already trained R-networks	No	32GBs
DMLab	PPO + EC with R-network training	Yes, otherwise, training is slower by >20x. Required GPU RAM: 5GBs	50GBs Tip: reduce `dataset_buffer_size` for using less RAM at the expense of policy performance.
DMLab	PPO + ECO	Yes, otherwise, raining is slower by >20x. Required GPU RAM: 5GBs	80GBs Tip: reduce `observation_history_size` for using less RAM, at the expense of policy performance
Mujoco	PPO + EC using already trained R-networks	No	32GBs

Trained models

Trained R-networks and policies can be found in the episodic-curiosity Google cloud bucket. You can access them via the web interface, or copy them with the gsutil command from the Google Cloud SDK:

gsutil -m cp -r gs://episodic-curiosity/r_networks .
gsutil -m cp -r gs://episodic-curiosity/policies .

Example of command to visualize a trained policy with two episodes of 1000 steps, and create videos similar to the ones at the top of this page:

python -m episodic_curiosity.visualize_curiosity_reward --workdir=/tmp/ec_visualizations --r_net_weights=<path_to_r_network> --policy_path=<path_to_trained_policy> --alsologtostderr --num_episodes=2 --num_steps=1000 --visualization_type=surrogate_reward --trajectory_mode=do_nothing

This requires that you install extra dependencies for generating videos, with pip install -e .[video]

Training

On a single machine

scripts/launcher_script.py is the main entry point to reproduce the results of Table 1 in the paper. For instance, the following command line launches training of the PPO + EC method on the Sparse+Doors scenario:

python episodic-curiosity/scripts/launcher_script.py --workdir=/tmp/ec_workdir --method=ppo_plus_ec --scenario=sparseplusdoors

Main flags:

Flag	Descriptions
--method	Solving method to use, corresponds to the rows in table 1 of the paper. Possible values: `ppo, ppo_plus_ec, ppo_plus_eco, ppo_plus_grid_oracle`
--scenario	Scenario to launch. Corresponds to the columns in table 1 of the paper. Possible values: `noreward, norewardnofire, sparse, verysparse, sparseplusdoors, dense1, dense2`. `ant_no_reward` is also supported which corresponds to the first row of table S1.
--workdir	Directory where logs and checkpoints will be stored.
--run_number	Run number of the current run. This is used to create an appropriate subdir in workdir.
--r_networks_path	Only meaningful for the `ppo_plus_ec` method. Path to the root dir for pre-trained r networks. If specified, we train the policy using those pre-trained r networks. If not specified, we first generate the R network training data, train the R network and then train the policy.

Training takes a couple of days. We used CPUs with 16 hyper-threads, but smaller CPUs should do.

Under the hood, launcher_script.py launches train_policy.py with the right hyperparameters. For the method ppo_plus_ec, it first launches generate_r_training_data.py to accumulate training data for the R-network using a random policy, then launches train_r.py to train the R-network, and finally train_policy.py for the policy. In the method ppo_plus_eco, all this happens online as part of the policy training.

On Google Cloud

First, make sure you have the Google Cloud SDK installed.

scripts/launch_cloud_vms.py is the main entry point. Edit the script and replace the FILL-MEs with the details of your GCP project. In particular, you will need to point it to a GCP disk snapshot with the installed dependencies as described in the Installation section.

IMPORTANT: By default the script reproduces all results in table 1 and launches ~300 VMs on cloud with GPUs (7 scenarios x 4 methods x 10 runs). The cost of running all those VMs is very significant: on the order of USD 30 per day per VM based on early 2019 GCP pricing. Pass --i_understand_launching_vms_is_expensive to scripts/launch_cloud_vms.py to indicate that you understood that.

Under the hood, launch_cloud_vms.py launches one VM for each (scenario, method, run_number) tuple. The VMs use startup scripts to launch training, and retrieve the parameters of the run through Instance Metadata.

TIP: Use sudo journalctl -u google-startup-scripts.service to see the logs of the startup script.

Training logs

Each training job stores logs and checkpoints in a workdir. The workdir is organized as follows:

File or Directory	Description
`r_training_data/{R_TRAINING,VALIDATION}/`	TF Records with data generated from a random policy for R-network training. Only for method `ppo_plus_ec` without supplying pre-trained R-networks.
`r_networks/`	Keras checkpoints of trained R-networks. Only for method `ppo_plus_ec` without supplying pre-trained R-networks.
`reward_{train,valid,test}.csv`	CSV files with {train,valid,test} rewards, tracking the performance of the policy at multiple training steps.
`checkpoints/`	Checkpoints of the policy.
`log.txt`, `progress.csv`	Training logs and CSV from OpenAI's PPO2 code.

On cloud, the workdir of each job will be synced to a cloud bucket directory of the form <cloud_bucket_root>/<vm_id>/<method>/<scenario>/run_number_<d>/.

We provide a colab to plot graphs during training of the policies, using data from the reward_{train,valid,test}.csv files.

Related projects

Check out the code for Semi-parametric Topological Memory, which uses graph-based episodic memory constructed from a short video to navigate in novel environments (thus providing exploitation policy, complementary to the exploration policy in this work).

Known limitations

As of 2019/02/20, ppo_plus_eco method is not robust to restarts, because the R-network trained online is not checkpointed.

Name		Name	Last commit message	Last commit date
Latest commit History 28 Commits
colab		colab
episodic_curiosity.egg-info		episodic_curiosity.egg-info
episodic_curiosity		episodic_curiosity
misc		misc
scripts		scripts
third_party		third_party
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Episodic Curiosity Through Reachability

In ICLR 2019 [Project Website][Paper]

Requirements

Installation

Resource requirements for training

Trained models

Training

On a single machine

On Google Cloud

Training logs

Related projects

Known limitations

About

Releases

Packages

Languages

License

Maryia-M/episodic-curiosity

Folders and files

Latest commit

History

Repository files navigation

Episodic Curiosity Through Reachability

In ICLR 2019 [Project Website][Paper]

Requirements

Installation

Resource requirements for training

Trained models

Training

On a single machine

On Google Cloud

Training logs

Related projects

Known limitations

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages