Skip to content

v0.5.0

Compare
Choose a tag to compare
@muupan muupan released this 15 Nov 08:05
· 1385 commits to master since this release
fce10e4

Important enhancements

  • Batch synchronized training using multiple environment instances and a single GPU is supported for some agents:
    • A2C (added as chainerrl.agents.A2C)
    • PPO
    • DQN and other agents that inherits DQN except SARSA
  • examples/ale/train_dqn_ale.py now follows "Tuned DoubleDQN" setting by default, and supports prioritized experience replay as an option
  • examples/atari/train_dqn.py is added as a basic example of applying DQN to Atari.

Important bugfixes

  • A bug in chainerrl.agents.CategoricalDQN that deteriorates performance is fixed
  • A bug in atari_wrappers.LazyFrame that unnecessarily increases memory usage is fixed

Important destructive changes

  • chainerrl.replay_buffer.PrioritizedReplayBuffer and chainerrl.replay_buffer.PrioritizedEpisodicReplayBuffer are updated:
    • become FIFO (First In, First Out), reducing memory usage in Atari games
    • compute priorities more closely following the paper
  • eval_explorer argument of chainerrl.experiments.train_agent_* is dropped (use chainerrl.wrappers.RandomizeAction for evaluation-time epsilon-greedy)
  • Interface of chainerrl.agents.PPO has changed a lot
  • Support of Chainer v2 is dropped
  • Support of gym<0.9.7 is dropped
  • Support of loading chainerrl<=0.2.0's replay buffer is dropped

All updates

Enhancement

  • A2C (#149, thanks @iory!)
  • Add wrappers to cast observations (#160)
  • Fix on flake8 3.5.0 (#214)
  • Use ()-shaped array for scalar loss (#219)
  • FIFO prioritized replay buffer (#277)
  • Update Policy class to inherit ABCMeta (#280, thanks @uidilr!)
  • Batch PPO Implementation (#295, thanks @ljvmiranda921!)
  • Mimic the details of prioritized experience replay (#301)
  • Add ScaleReward wrapper (#304)
  • Remove GaussianPolicy and obsolete policies (#305)
  • Make random access queue sampling code cleaner (#309)
  • Support gym==0.10.8 (#324)
  • Batch A2C/PPO/DQN (#326)
  • Use RandomizeAction wrapper instead of Explorer in evaluation (#328)
  • remove duplicate lines (typo) (#329, thanks @monado3!)
  • Merge consecutive with statements (#333)
  • Use Variable.array instead of Variable.data (#336)
  • Remove code for Chainer v2 (#337)
  • Implement getitem for ActionValue (#339)
  • Count updates of DQN (#341)
  • Move Atari Wrappers (#349)
  • Render wrapper (#350)

Documentation

  • fixes minor typos (#306)
  • fixes typo (#307)
  • Typos (#308)
  • fixes readme typo (#310)
  • Adds partial list of paper implementations with links to the main README (#311)
  • Adds another paper to list (#312)
  • adds some instructions regarding testing for potential contributors (#315)
  • Remove duplication of DQN in docs (#334)
  • nit on grammar of a comment: (#354)

Examples

  • Tuned DoubleDQN with prioritized experience replay (#302)
  • adds some descriptions to parseargs arguments (#319)
  • Make clip_eps positive (#340)
  • updates env in ddpg example (#345)
  • Examples (#348)

Testing

  • Fix Travis CI errors (#318)
  • Parse Chainer version with packaging.version (#322)
  • removes tests for old replay buffer (#347)

Bugfixes

  • Fix the error caused by inexact delta_z (#314)
  • Stop caching the result of numpy.concatenate in LazyFrames (#332)