released this
15 Nov 08:05
Important enhancements
Batch synchronized training using multiple environment instances and a single GPU is supported for some agents:
A2C (added as chainerrl.agents.A2C
DQN and other agents that inherits DQN except SARSA
now follows "Tuned DoubleDQN" setting by default, and supports prioritized experience replay as an option
is added as a basic example of applying DQN to Atari.
Important bugfixes
A bug in chainerrl.agents.CategoricalDQN
that deteriorates performance is fixed
A bug in atari_wrappers.LazyFrame
that unnecessarily increases memory usage is fixed
Important destructive changes
and chainerrl.replay_buffer.PrioritizedEpisodicReplayBuffer
are updated:
become FIFO (First In, First Out), reducing memory usage in Atari games
compute priorities more closely following the paper
argument of chainerrl.experiments.train_agent_*
is dropped (use chainerrl.wrappers.RandomizeAction
for evaluation-time epsilon-greedy)
Interface of chainerrl.agents.PPO
has changed a lot
Support of Chainer v2 is dropped
Support of gym<0.9.7 is dropped
Support of loading chainerrl<=0.2.0's replay buffer is dropped
All updates
A2C (#149 , thanks @iory !)
Add wrappers to cast observations (#160 )
Fix on flake8 3.5.0 (#214 )
Use ()-shaped array for scalar loss (#219 )
FIFO prioritized replay buffer (#277 )
Update Policy class to inherit ABCMeta (#280 , thanks @uidilr !)
Batch PPO Implementation (#295 , thanks @ljvmiranda921 !)
Mimic the details of prioritized experience replay (#301 )
Add ScaleReward wrapper (#304 )
Remove GaussianPolicy and obsolete policies (#305 )
Make random access queue sampling code cleaner (#309 )
Support gym==0.10.8 (#324 )
Batch A2C/PPO/DQN (#326 )
Use RandomizeAction wrapper instead of Explorer in evaluation (#328 )
remove duplicate lines (typo) (#329 , thanks @monado3 !)
Merge consecutive with statements (#333 )
Use Variable.array instead of (#336 )
Remove code for Chainer v2 (#337 )
Implement getitem for ActionValue (#339 )
Count updates of DQN (#341 )
Move Atari Wrappers (#349 )
Render wrapper (#350 )
fixes minor typos (#306 )
fixes typo (#307 )
Typos (#308 )
fixes readme typo (#310 )
Adds partial list of paper implementations with links to the main README (#311 )
Adds another paper to list (#312 )
adds some instructions regarding testing for potential contributors (#315 )
Remove duplication of DQN in docs (#334 )
nit on grammar of a comment: (#354 )
Tuned DoubleDQN with prioritized experience replay (#302 )
adds some descriptions to parseargs arguments (#319 )
Make clip_eps positive (#340 )
updates env in ddpg example (#345 )
Examples (#348 )
Fix Travis CI errors (#318 )
Parse Chainer version with packaging.version (#322 )
removes tests for old replay buffer (#347 )
Fix the error caused by inexact delta_z (#314 )
Stop caching the result of numpy.concatenate in LazyFrames (#332 )
You can’t perform that action at this time.