https://arxiv.org/abs/2002.12928
A Self-Tuning Actor-Critic Algorithm (Tom Zahavy, Zhongwen Xu, Vivek Veeriah, Matteo Hessel, Junhyuk Oh, Hado van Hasselt, David Silver, Satinder Singh)
Self-Tuning Deep Reinforcement Learning (Tom Zahavy, Zhongwen Xu, Vivek Veeriah, Matteo Hessel, Hado Van Hasslet, David Silver, Satinder Singh)
RL은 잘 모르지만 충분히 흥미로워보임. 하이퍼파라미터 튜닝을 RL 학습 과정과 결합하기 위한 시도. 임팔라에 메타그래디언트를 붙이고 개선 및 확장한 것이 핵심인 듯.
#reinforcement_learning #hyperparameter #optimization #meta_learning