Skip to content

CeeChess 1.4 - The tuning update

Latest
Compare
Choose a tag to compare
@bctboi23 bctboi23 released this 16 Feb 07:25
· 13 commits to master since this release
3d53576

+150 ELO self-play (1s / move)

  • Increased Futility and Reverse Futility Pruning Depths
  • Tweaked LMR
    • Cleaner, easier to understand code
    • Researches null window first before searching full window
  • Added Second Set of Killer Moves
  • Added King Safety in the form of King Tropism
    • Extra bonus to diagonals in line with the king
    • Extra bonus to attack if enemy king is near semi-open files
    • Weighted by attacker's material
  • Evaluation tuned using a logistic regression over a custom constructed dataset, similar to the Texel Method
    • Black box tuning was done using Simulated annealing + local search, using a pseudo-huber loss
      • pseudo-huber loss was used here since there are likely outliers that would unfavorably skew the relatively simple evaluation function. This was a choice I made based on what I understood about the dataset, and made a marginal improvement to the evaluation quality as opposed to the traditional MSE loss (+10ish elo from 2000 games). If my evaluation were more complex, I might be more tempted to stay with MSE loss, as long as on-board checkmates are removed from the dataset
  • Fixed some timeout bugs
  • Increased hash table stability
    (95)

Gauntlet run for test ratings (1 min, 0.5sec inc), with elo centered around the v1.4 release (ratings from bayeselo):

Rank Name Elo + - Games Score Oppo. Draws
1 Barbarossa-0.6.0 38 34 33 240 55% 95 23%
2 CeeChess-v1.4 0 13 13 1664 65% -13 26%
3 Barbarossa-0.5.0-win10-64 -34 33 33 240 45% 95 28%
4 Kingfisher.v1.1.1 -107 32 33 240 34% 95 36%
5 gopher_check -146 34 35 238 29% 95 26%
6 CeeChess 1.3.2 -149 34 36 238 29% 95 25%
...

Since CCRL ratings got adjusted down recently (stockfish went from 3900 CCRL to ~3630 afaik), this no longer breaks the CCRL 2400 barrier, but comparing the results here to the old ratings of Barbarossa-0.6.0(2468), Barbarossa-0.5.0(~2375ish i believe?) and the others suggests that this release would have broken that barrier. I now expect the engine to land in the range of 2300-2350, given Barbarossa-0.6.0 has a new rating of 2355