Why we need the features of all rounds to predict the final reward? #19

wongsingfo · 2022-08-24T10:12:20Z

wongsingfo
Aug 24, 2022

Both mortal and suphx [1] use a global reward predictor to predict the final game reward when the i-th game begins. The predictor uses the features (i.e. scores of 4 players, grand_kyoku, honba, and kyotaku) of not only the i-th round but also all previous rounds.

I am wondering why we need the features before the i-th round? I think they are independent factors for the final reward. In other words, no matter how well or how poor the player performs from the first round to the (i-1)-th round, the expected final ranking should be the same given that the features of the i-th round are the same.

[1] Suphx: Mastering Mahjong with Deep Reinforcement Learning. arXiv preprint arXiv:2003.13590, 2020a. Section 3.2

Equim-chan · 2022-08-24T11:10:14Z

Equim-chan
Aug 24, 2022
Maintainer

I'm not sure. I was following Suphx's method because it was tested to have worked. Maybe you could do some experiment by replacing the GRU part with 2 layers of MLP of the same number of parameters, and see if the performances are the same.

0 replies

hyskylord · 2023-11-01T13:00:03Z

hyskylord
Nov 1, 2023

I think the assumption here is that a player will tend to use the same strategy in all rounds (for both human players and AIs), so that you can predict the behaviour of a player by its actions in previous rounds.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Why we need the features of all rounds to predict the final reward? #19

{{title}}

Replies: 2 comments

{{title}}

{{title}}

Select a reply

Why we need the features of *all* rounds to predict the final reward? #19

wongsingfo Aug 24, 2022

Replies: 2 comments

Equim-chan Aug 24, 2022 Maintainer

hyskylord Nov 1, 2023

Why we need the features of all rounds to predict the final reward? #19

wongsingfo
Aug 24, 2022

Equim-chan
Aug 24, 2022
Maintainer

hyskylord
Nov 1, 2023