v0.5.0

muupan released this 15 Nov 08:05

· 1385 commits to master since this release

fce10e4

Important enhancements

Batch synchronized training using multiple environment instances and a single GPU is supported for some agents:
- A2C (added as chainerrl.agents.A2C)
- PPO
- DQN and other agents that inherits DQN except SARSA
examples/ale/train_dqn_ale.py now follows "Tuned DoubleDQN" setting by default, and supports prioritized experience replay as an option
examples/atari/train_dqn.py is added as a basic example of applying DQN to Atari.

Important bugfixes

A bug in chainerrl.agents.CategoricalDQN that deteriorates performance is fixed
A bug in atari_wrappers.LazyFrame that unnecessarily increases memory usage is fixed

Important destructive changes

chainerrl.replay_buffer.PrioritizedReplayBuffer and chainerrl.replay_buffer.PrioritizedEpisodicReplayBuffer are updated:
- become FIFO (First In, First Out), reducing memory usage in Atari games
- compute priorities more closely following the paper
eval_explorer argument of chainerrl.experiments.train_agent_* is dropped (use chainerrl.wrappers.RandomizeAction for evaluation-time epsilon-greedy)
Interface of chainerrl.agents.PPO has changed a lot
Support of Chainer v2 is dropped
Support of gym<0.9.7 is dropped
Support of loading chainerrl<=0.2.0's replay buffer is dropped

All updates

Enhancement

A2C (#149, thanks @iory!)
Add wrappers to cast observations (#160)
Fix on flake8 3.5.0 (#214)
Use ()-shaped array for scalar loss (#219)
FIFO prioritized replay buffer (#277)
Update Policy class to inherit ABCMeta (#280, thanks @uidilr!)
Batch PPO Implementation (#295, thanks @ljvmiranda921!)
Mimic the details of prioritized experience replay (#301)
Add ScaleReward wrapper (#304)
Remove GaussianPolicy and obsolete policies (#305)
Make random access queue sampling code cleaner (#309)
Support gym==0.10.8 (#324)
Batch A2C/PPO/DQN (#326)
Use RandomizeAction wrapper instead of Explorer in evaluation (#328)
remove duplicate lines (typo) (#329, thanks @monado3!)
Merge consecutive with statements (#333)
Use Variable.array instead of Variable.data (#336)
Remove code for Chainer v2 (#337)
Implement getitem for ActionValue (#339)
Count updates of DQN (#341)
Move Atari Wrappers (#349)
Render wrapper (#350)

Documentation

fixes minor typos (#306)
fixes typo (#307)
Typos (#308)
fixes readme typo (#310)
Adds partial list of paper implementations with links to the main README (#311)
Adds another paper to list (#312)
adds some instructions regarding testing for potential contributors (#315)
Remove duplication of DQN in docs (#334)
nit on grammar of a comment: (#354)

Examples

Tuned DoubleDQN with prioritized experience replay (#302)
adds some descriptions to parseargs arguments (#319)
Make clip_eps positive (#340)
updates env in ddpg example (#345)
Examples (#348)

Testing

Fix Travis CI errors (#318)
Parse Chainer version with packaging.version (#322)
removes tests for old replay buffer (#347)

Bugfixes

Fix the error caused by inexact delta_z (#314)
Stop caching the result of numpy.concatenate in LazyFrames (#332)

Assets 2