Thanks to the author of gym-2048 https://github.com/rgal/gym-2048. The code is easy to understand and runs efficiently. I just made some little changes to make it a better RL environment. And I implemented dqn with many tricks using pytorch:
- Randomly fill buffer first;
- Soft target replacing;
- Epsilon decay;
- Clip gradient norm;
- Double DQN;
- Priority Experience Replay;
I used random policy to evaluate the performance for 1000 times. We can take random policy as a baseline.
The evaluation main function is in base_agent.py.
average episode time:0.10279795455932617 s;
average step time: 0.7373 ms;
average highest score:106.368;
average total score:1078.252;
average steps:139.417;
average episode time:0.03773710775375366 s;
average step time: 0.2671 ms;
average highest score:108.24;
average total score:1102.088;
average steps:141.288;
Training for 45k episodes and the max eval mean score is 7700(eval for 50 episodes).
- add max steps and max illegal steps of one episode;
- add dqn agent and training infomation;
- fix bug on the Double Q trick (the issue raised by mythsman);