Japanese
AobaKomaochi is a distributed Deep reinforcement learning for Shogi handicap games without human knowledge.


Handicaps are seven kinds. Lance(kyo ochi), Bishop(kaku ochi), Rook(hisha ochi), 2-Piece(ni-mai ochi), 4-Piece(yon-mai ochi), 6-Pieces(roku-mai ochi) and No handicap(hirate).
Winrate are adjusted to keep 0.5 by weakening Black(shitate or sente) player strength.
Can AI discover a new opening, or rediscover Two-Pawn Sacrifice Push, Silver Tandem, etc?
If you are interested, please join us. Anyone can contribute using Google Colab.

GitHub Source and Windows binary. GitHub(Japanese top page)

2021-09-20 v23 kldgain option for training. update required. w745, 7940000 games.
2021-08-05 Drop the learning rate to 0.0001. (from 3711k games, w321).
2021-06-28 v1.1 softmax temperature > 1.0 is adjusted, even if moves <= 30. aobak ver is 20. w92,1430000 games.
2021-06-23 Windows version(v1.0) is released.
2021-06-07 Fixed adjustment ELO method.
2021-06-07 Bug fix. It fails to find 1 ply mate sometimes.
2021-06-06 Web site open. Google Colab is available. Interestingly, at present, uwate(white)'s winrate is high in 6-Piece. This is because less pieces player has more chance to get pieces if you move pieces almost randomly. AobaKomaochi uses 27-point declare rule. The removed pieces are counted towards uwate(white)'s total.


2021-09-21 10:59 JST(update every 30 minutes)
In past hour,number of clients are 20, 3969 games.
In past 24 hours, number of clients are 22, 35403 games.
Total 8038347 games. Latest weight= w753. Next is in 0.6 hours. Thank you for your contribution!
In past 7000 gamesIn past 500,000 games
Average of movesSente winrateDraw rateAverage of movesSente winrateDraw rate Handicap ELO
No handicap 136.2 0.503 0.046 136.3 0.501 0.055 25
Lance 135.5 0.501 0.041 134.3 0.500 0.033 114
Bishop 132.2 0.454 0.015 133.5 0.498 0.018 389
Rook 122.5 0.517 0.015 123.0 0.499 0.013 430
2-Piece 114.0 0.518 0.004 117.4 0.500 0.004 598
4-Piece 101.8 0.527 0.003 105.7 0.502 0.003 674
6-Piece 101.9 0.519 0.003 101.4 0.505 0.001 794

Elo progress. self-match with 1playout/move (left vertical axis), and vs Kristallweizen(6.00) 20k/move(right vertical axis). Right Elo is based on floodgate.
As of 2021-09-20.

AobaKomaochi 100playout/move vs Kristallweizen(6.00) 20k/move. 400 match games.


Game samples without noise. There is no ELO adjustment for black player. So black tends to win.
Self-play games without noise. Each game uses same weight.

You can see the transition of opening moves.


Self-play games for training.
1 sample per 1 weight. It will be updated each day.

For randomeness, it often plays blunder for the first 30 moves. And Black strength is adjusted.


Game records
From arch000000000000.csa.xz to arch00003200000.csa.xz.
From
no000000000000.csa to
no000000500007.csa
 are generated by not using neural network, but random function.
The first game that is generated by neural network is
no000000500008.csa
256x20block, replay buffer is past 500,000 games.
Weights
From w000000000001.txt.xz to w000000000271.txt.xz.
Network size is 256 x 20 block (ResNet). AlphaZero style.
w001  ... 256x20b,minibatch  128, learning rate 0.01,     wd 0.0002, momentum 0.9,   500000 games fail at w009.
w001  ... 256x20b,minibatch  128, learning rate 0.001,    wd 0.0002, momentum 0.9,   500000 games. restart with smaller lr.
w321  ... 256x20b,minibatch  128, learning rate 0.0001,   wd 0.0002, momentum 0.9,  3711485 games
w524  ... 256x20b,minibatch  128, learning rate 0.00001,  wd 0.0002, momentum 0.9,  5738768 games