[1]: https://hlfshell.ai/posts/deepmind-grandmaster-chess-without...
It’s still very cool that they could learn a very good eval function that doesn’t require search. I would’ve liked the authors to throw out the games where the Stockfish fallback kicked in though. Even for a human, mate in 2 vs mate in 10 is the difference between a win and a draw/loss on time.
I also would’ve liked to see a head to head with limited search depth Stockfish. That would tell us approximately how much of the search tree their eval function distilled.
As for limited search tree I like the idea! I think it's tough to measure, since the time it takes to perform search across various depths vary wildly based on the complexity of the position. I feel like you would have to compile a dataset of specific positions identified to require significant depth of search to find a "good" move.
And limited depth games would not have been difficult to run. You can run a limited search Stockfish on a laptop using the UCI protocol: https://github.com/official-stockfish/Stockfish/wiki/UCI-%26...
Currently there's a very interesting war between small neural networks on the CPU with high search depth alpha-beta pruning (stockfish NNUE) and big neural networks on a GPU with Monte Carlo search and lower depth (lc0).
So, while machines beating humans is "solved", chess is very far from solved (just ask the guys who have actually solved chess endgames with 8 or less pieces).
even in human chess people sometimes mistaken draw frequency to reflect both sides playing optimally, but there are many games where a winning advantage slips away into a draw
No computers now or in the foreseeable future will be capable of solving chess. It has an average branching factor over 30 and games can be over 100 moves.
NNUE already tries to distill a subtree eval into a neural net, but it’s optimized for CPU rather than GPU.
What you’re discussing sounds like intuition with checking, which is pretty close to how humans with a moderate degree of skill behave. I haven’t known enough Chess or Go masters to have any claim on how they think. But most of us don’t want an opponent at that level and if we did, we would certainly find a human, or just play against ourselves.
If you want a computer that plays like a human, you will probably need to imitate the way that a human thinks about the game. This means for example thinking about the interactions between pieces and the flow of the game rather than stateless evaluations.
Grandmaster-Level Chess Without Search
Say I want to play chess with an opponent that is at about the same skill level as me, or perhaps I want to play with an opponent about 100 rating points above me for training.
Most engines let you dumb them down by cutting search depth, but that usually doesn't work well. Sure, you end up beating them about half the time if you cut the search down enough but it generally feels like they were still outplaying you for much of the game and you won because they made one or two blunders.
What I want is a computer opponent that plays at a level of my choosing but plays a game that feels like that of a typical human player of that level.
Are there such engines?
Basic computer opponents on the other hand can make moves all over the place. They look at the board state holistically. This can be very frustrating to play against as a human who has enough problems just thinking their way through some subset of the board, but is thrown off by the computer again and again.
It's not that bad in chess at least (compared to Go), but still something worth to keep in mind if you're trying to make an AI that is fun to play against as an amateur.
It uses a similar approach to Maia but with a different neural network, so it had a bit better move matching performance. And on top of that it has an expectation maximization algorithm so that the bot will try to exploit your mistakes.
It’s supposedly good up to about 1300, but aside from that the ability to prompt can make the style of play somewhat tunable for ex aggressive, defensive, etc.
The best neural network chess engine's authors wrote about this deepminds publication.
"What would Stockfish Do?"
A more appropriate title; because Stockfish is a search-base system and DeepMind's approach wouldn't work without it.
Oh, btw, this is (yet another) a Neurosymbolic system of the "compiling system 2 to system 1" type.
Why compare this to GPT-3.5-turbo-instruct? Is that near SOTA in this space?
This implies the model is around 2500 blitz vs humans. As blitz elo are often much higher than in classical time controls, 2500 elo on chess.com places it firmly in the 'good but not great' level.
I am very curious to know whether the model suffers from the same eval problems vs the well known "anti-bot" openings that stockfish is susceptible to at limited search depths.
Yeah, no. They are two different rating systems (not ELO incidentally) with different curves, there isn't a fixed difference you can apply. At the high end of the scale lichess ratings are below, not above, chess.com ratings. E.g. Magnus Carlsen is 3131 blitz on lichess [0], 3294 blitz on chess.com [1].
This website [2] tries to translate between the sites, and figures that a 2925 lichess blitz rating (the closet on the website to the one reported in the paper of 2895) translates to 3000 chess.com.
[0] Multiple accounts but this is the one I found with the most blitz games: https://lichess.org/@/DrNykterstein/perf/blitz
[1] https://www.chess.com/member/magnuscarlsen
[2] https://chessgoals.com/rating-comparison/#lichesschesscom
On lichess puzzles gpt4o with the compiled prompt is around 70%, I think the 270M transformer is around 95%
It wouldn't be competitive against top tier players and AI, but I wouldn't be surprised if it could beat me. 'Instantly' knowing the next move would be a cool trick.
They have managed to create one for 7 pieces. Last update on trying to get to 8 piece database: https://www.chess.com/blog/Rocky64/eight-piece-tablebases-a-...
> From May to August 2018 Bojun Guo generated 7-piece tables. The 7-piece tablebase contains 423,836,835,667,331 unique legal positions in about 18 Terabytes.
Chess configs = 4.8 x 10^44, Atoms > 10^70
https://tromp.github.io/chess/chess.html https://physics.stackexchange.com/questions/47941/dumbed-dow...
You might be able to pull off a low-resolution lookup table. Take some big but manageable number N (e.g 10^10) and calculate the maximally even distribution of those points over the total space of chessboard configurations. Then make a lookup table for those configs. In play, for configs not in the table, interpolate between nearest points in the table.
The resolution isn't great, and adding search to that can be used to develop an implicit measure of how accurate the function is (ie, probability the move suggested in a position remains unchanged after searching the move tree for better alternatives).
It uses paradigmatic PyTorch with easy to read code, and the architecture is similar to the current best performing chess neural nets.