Replies: 2 comments
-
对于Mortal 我认为有来自较低层次的人类游戏的相对大量的数据,这对模型的训练是消极的.毕竟grp太不精准了.正确率只有0.23. 仅使用最近的游戏数据,我不认为是好办法.但可以和其他方法结合在一起. 你或许可以参考我的筛选过程 虽然很多地方的处理的很粗糙 : 1.删除了凤桌2010年前几个月的日志 因为每日对局数量较少.. 2.需要删除同桌率高且平均得点高的对局.(我未完成) 3.统计了所有玩家凤桌的对局pt得点 , (如果你需要的话,可以在计算得点时根据对局日期,给一个差距很小的权重) 4.然后使用了滑动窗口算法按连续的300场作为区间(其中允许扩展递增到350每次递增10),找到最大的pt得点区间,记录并移除. 5.将所有对局区间记录排名.按需求取指定数量的记录,我是根据pt取的. 共20.1w,去除重复19.8w. 6.将训练集对应的玩家名字设置为指定名称.然后再训练中过滤掉其他名字的数据. |
Beta Was this translation helpful? Give feedback.
-
My opinion is that bad data quality should have a negative impact on train_grp, so a better approach might be training grp from data produced by a well-trained Mortal self-play. As for the network itself, Mortal might benefit from quantity more to deal with certain edge cases. |
Beta Was this translation helpful? Give feedback.
-
I have a few questions and thoughts regarding the quality and quantity of the game data being used in our model training, particularly with respect to the Tanhou Phoenix table data.
Dataset Quality
From the config file, I noticed that Mortal may use data from the Tanhou Phoenix table. I am curious about how the quality of the game data affects the model's performance. Specifically, if the average skill level of human players in 2010 was lower compared to recent years, would it be beneficial to use only recent game data for training?
Dataset Quantity
In a more generalized sense, if we have a relatively large amount of data from lower-level human games, will this contribute positively or negatively to the model's training? How do we balance the quantity of data with its quality to ensure optimal model performance?
Beta Was this translation helpful? Give feedback.
All reactions