Releases
v0.5.0
muupan
released this
15 Nov 08:05
Important enhancements
Batch synchronized training using multiple environment instances and a single GPU is supported for some agents:
A2C (added as chainerrl.agents.A2C
)
PPO
DQN and other agents that inherits DQN except SARSA
examples/ale/train_dqn_ale.py
now follows "Tuned DoubleDQN" setting by default, and supports prioritized experience replay as an option
examples/atari/train_dqn.py
is added as a basic example of applying DQN to Atari.
Important bugfixes
A bug in chainerrl.agents.CategoricalDQN
that deteriorates performance is fixed
A bug in atari_wrappers.LazyFrame
that unnecessarily increases memory usage is fixed
Important destructive changes
chainerrl.replay_buffer.PrioritizedReplayBuffer
and chainerrl.replay_buffer.PrioritizedEpisodicReplayBuffer
are updated:
become FIFO (First In, First Out), reducing memory usage in Atari games
compute priorities more closely following the paper
eval_explorer
argument of chainerrl.experiments.train_agent_*
is dropped (use chainerrl.wrappers.RandomizeAction
for evaluation-time epsilon-greedy)
Interface of chainerrl.agents.PPO
has changed a lot
Support of Chainer v2 is dropped
Support of gym<0.9.7 is dropped
Support of loading chainerrl<=0.2.0's replay buffer is dropped
All updates
Enhancement
A2C (#149 , thanks @iory !)
Add wrappers to cast observations (#160 )
Fix on flake8 3.5.0 (#214 )
Use ()-shaped array for scalar loss (#219 )
FIFO prioritized replay buffer (#277 )
Update Policy class to inherit ABCMeta (#280 , thanks @uidilr !)
Batch PPO Implementation (#295 , thanks @ljvmiranda921 !)
Mimic the details of prioritized experience replay (#301 )
Add ScaleReward wrapper (#304 )
Remove GaussianPolicy and obsolete policies (#305 )
Make random access queue sampling code cleaner (#309 )
Support gym==0.10.8 (#324 )
Batch A2C/PPO/DQN (#326 )
Use RandomizeAction wrapper instead of Explorer in evaluation (#328 )
remove duplicate lines (typo) (#329 , thanks @monado3 !)
Merge consecutive with statements (#333 )
Use Variable.array instead of Variable.data (#336 )
Remove code for Chainer v2 (#337 )
Implement getitem for ActionValue (#339 )
Count updates of DQN (#341 )
Move Atari Wrappers (#349 )
Render wrapper (#350 )
Documentation
fixes minor typos (#306 )
fixes typo (#307 )
Typos (#308 )
fixes readme typo (#310 )
Adds partial list of paper implementations with links to the main README (#311 )
Adds another paper to list (#312 )
adds some instructions regarding testing for potential contributors (#315 )
Remove duplication of DQN in docs (#334 )
nit on grammar of a comment: (#354 )
Examples
Tuned DoubleDQN with prioritized experience replay (#302 )
adds some descriptions to parseargs arguments (#319 )
Make clip_eps positive (#340 )
updates env in ddpg example (#345 )
Examples (#348 )
Testing
Fix Travis CI errors (#318 )
Parse Chainer version with packaging.version (#322 )
removes tests for old replay buffer (#347 )
Bugfixes
Fix the error caused by inexact delta_z (#314 )
Stop caching the result of numpy.concatenate in LazyFrames (#332 )
You can’t perform that action at this time.