-
Notifications
You must be signed in to change notification settings - Fork 5
References
Human-Timescale Adaptation in an Open-Ended Task Space (Adaptive agent, Ada)
IMPALA: Scalable Distributed Deep-RL with Importance Weighted Actor-Learner Architectures
Muesli: Combining Improvements in Policy Optimization
Podracer architectures for scalable Reinforcement Learning
Safe and efficient off-policy reinforcement learning (Retrace)
OFF-POLICY ACTOR-CRITIC WITH SHARED EXPERIENCE REPLAY(Replay buffer, V-trace)
Understanding Multi-Step Deep Reinforcement Learning(Off-policy correction)
VALUE-AWARE IMPORTANCE WEIGHTING FOR OFF-POLICY REINFORCEMENT LEARNING(Off-policy correction)
SAMPLE EFFICIENT ACTOR-CRITIC WITH EXPERIENCE REPLAY
Multi-Step Reinforcement Learning:A Unifying Algorithm
Q(λ) with Off-Policy Corrections
NEVER GIVE UP: LEARNING DIRECTED EXPLORATION STRATEGIES
Distributed training using actor-critic reinforcement learning with off-policy correction factors
MAXIMUM A POSTERIORI POLICY OPTIMISATION (MPO)
Mastering Atari, Go, Chess and Shogi by Planning with a Learned Model (MuZero)
A Survey of Meta-Reinforcement Learning
Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context
Transformers are Meta-Reinforcement Learners
STABILIZING TRANSFORMERS FOR REINFORCEMENT LEARNING
https://github.com/werner-duvaud/muzero-general
https://github.com/werner-duvaud/muzero-general/wiki/How-MuZero-works
https://github.com/facebookresearch/torchbeast/tree/main
https://github.com/ray-project/ray/tree/master/rllib/algorithms/impala