ML Reproducibility Challenge 2020 is a community challenge for machine learning enthusiasts, students, and researchers in which participants select a paper from one of the prestigious ML conferences of the year (NeurIPS, ICML, ICLR, ACL, EMNLP, CVPR or ECCV). In an attempt to relicate the paper, participants provide additional support either towards or against the claims and results of the work. Alongside the goal of evaluating the validity/legitemacy of recent research, this project primary serves to assess the reproducibility/replicability of Machine Learning research.
Here is a collection of my work-in-progress code for the challenge. I am working on Discovering Reinforcement Learning Algorithms by DeepMind.
This code is not expected to be very well organized until towards the end of the project. I am merely dropping the source code thus far into this repo each time I update it.
Thurday, November 5, 2020:
- Implemented all Grid World environments, including Tabular Grid World and Random Grid World, and the five maps for each type of grid world.
Sunday, November 8, 2020
- Implemented all Delayed Chain MDP environments, which includes 4 standard maps and 1 unique mode.
Saturday, November 14, 2020
- Rewrote the Grid World environments, doubling the speed of simulation and improving the rendering graphics
Tuesday, November 17, 2020
- Wrote the
Agent
abstract class, and all subsequent agent concrete classes, includingTabularAgent
(for the Tabula Grid World environment),FunctionalAgent
(for environments demanding function approximation, ie. Random Grid World and Delayed Chain MDP + State Distraction), andBinaryAgent
(for the standard Delayed Chain MDP environments without state distraction).
Wednesday, November 18, 2020
- Wrote the
LPG
Model class and the Embedding layer it uses to encode the categorical prediction vector
Sunday, November 22, 2020
- Began to attempt the first tests at a simple implementation. Work in progress, but starting to write the overall code.
Saturday, November 28, 2020
- Began writing the final code implementation.
-
Read DeepMind's Discovering Reinforcement Learning Algorithms
-
Write the gym Class for the custom Grid World environments
-
Write the gym Class for the custom MDP environments
-
Fix a rendering bug in the Random Grid World environment that causes some squares to remain lit although the reward located on that square was already collected.
-
Write a custom TensorFlow model for the Learned Policy Gradient architecture
-
Write some classes for the various agent structures for each training environment
-
Agent
abstract class -
TabularAgent
for Tabular Grid World -
FunctionalAgent
for Random Grid World and Delayed Chain MDP + State Distraction -
BinaryAgent
for Delayed Chain MDP
-
-
Implement the agent update
-
Implement the Learned Policy Gradient algorithm
-
Train on each environment
-
Test the learned update rule on each Atari environment
- Write a model class for the network architecture specified by the authors — C(32)-C(64)-C(64)-D(512)
- Test performance over 20 million frames on each of Montezuma Revenge, Ms. Pacman, Riverraid, Pitfall, Tutankham (the environments in which LPG outperformed/matched A2C)