References

Jump to bottom

Itomigna2 edited this page Apr 24, 2024 · 11 revisions

References

Papers

Human-Timescale Adaptation in an Open-Ended Task Space (Adaptive agent, Ada)

IMPALA: Scalable Distributed Deep-RL with Importance Weighted Actor-Learner Architectures

Muesli: Combining Improvements in Policy Optimization

Podracer architectures for scalable Reinforcement Learning

Safe and efficient off-policy reinforcement learning (Retrace)

OFF-POLICY ACTOR-CRITIC WITH SHARED EXPERIENCE REPLAY(Replay buffer, V-trace)

Understanding Multi-Step Deep Reinforcement Learning(Off-policy correction)

VALUE-AWARE IMPORTANCE WEIGHTING FOR OFF-POLICY REINFORCEMENT LEARNING(Off-policy correction)

SAMPLE EFFICIENT ACTOR-CRITIC WITH EXPERIENCE REPLAY

Multi-Step Reinforcement Learning:A Unifying Algorithm

Q(λ) with Off-Policy Corrections

NEVER GIVE UP: LEARNING DIRECTED EXPLORATION STRATEGIES

Distributed training using actor-critic reinforcement learning with off-policy correction factors

MAXIMUM A POSTERIORI POLICY OPTIMISATION (MPO)

Mastering Atari, Go, Chess and Shogi by Planning with a Learned Model (MuZero)

A Survey of Meta-Reinforcement Learning

Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context

Transformers are Meta-Reinforcement Learners

STABILIZING TRANSFORMERS FOR REINFORCEMENT LEARNING

Repos

https://github.com/werner-duvaud/muzero-general

https://github.com/werner-duvaud/muzero-general/wiki/How-MuZero-works

https://github.com/facebookresearch/torchbeast/tree/main

https://github.com/ray-project/ray/tree/master/rllib/algorithms/impala

https://github.com/kimiyoung/transformer-xl

https://github.com/luckeciano/transformers-metarl