Grandmaster-level chess without search

146 points by lawrenceyan 8 hours ago | 55 comments

hlfshell 7 hours ago |
I did a talk about this! (And also wrote up about my talk here[1]). This paper is a great example of both knowledge distillation. It's less of a paper about chess and more about how complicated non linear search functions - complete with whatever tuning experts can prepare - can be distilled into a (quasi-linear, if it's a standardized input like chess) transformer model.
[1]: https://hlfshell.ai/posts/deepmind-grandmaster-chess-without...
janalsncm 6 hours ago |
I think the vs. humans result should be taken with a huge grain of salt. These are blitz games, and their engine’s elo was far higher against humans than against other bots. So it’s likely that time was a factor, where humans are likely to flag (run out of time) or blunder in low time situations.
It’s still very cool that they could learn a very good eval function that doesn’t require search. I would’ve liked the authors to throw out the games where the Stockfish fallback kicked in though. Even for a human, mate in 2 vs mate in 10 is the difference between a win and a draw/loss on time.
I also would’ve liked to see a head to head with limited search depth Stockfish. That would tell us approximately how much of the search tree their eval function distilled.
hlfshell 6 hours ago |
The reason the time (blitz) games make sense is because the distilled functionality is of a 50ms Stockfish eval function. The engine likely would perform worse as only the human would benefit from the additional time.
As for limited search tree I like the idea! I think it's tough to measure, since the time it takes to perform search across various depths vary wildly based on the complexity of the position. I feel like you would have to compile a dataset of specific positions identified to require significant depth of search to find a "good" move.
janalsncm 6 hours ago |
My point is that if the computer never flags it will have an inherent advantage in low time controls. If not, why not just test it in hyperbullet games? Games where humans flag in a drawn or winning position need to be excluded, otherwise it’s unclear what this is even measuring.
And limited depth games would not have been difficult to run. You can run a limited search Stockfish on a laptop using the UCI protocol: https://github.com/official-stockfish/Stockfish/wiki/UCI-%26...
joelthelion 7 hours ago |
I wonder if you could creatively combine this model with search algorithms to advance the state of the art in computer chess? I wouldn't be surprised to see such a bot pop up on tcec in a couple years.
alfalfasprout 7 hours ago |
The thing is classical chess (unlike eg; go) is essentially "solved" when run on computers capable of extreme depth. Modern chess engines play essentially flawlessly.
KK7NIL 7 hours ago |
The developers of stockfish and lc0 (and the many weaker engines around) would disagree, we've seen their strength improve considerably over the last few years.
Currently there's a very interesting war between small neural networks on the CPU with high search depth alpha-beta pruning (stockfish NNUE) and big neural networks on a GPU with Monte Carlo search and lower depth (lc0).
So, while machines beating humans is "solved", chess is very far from solved (just ask the guys who have actually solved chess endgames with 8 or less pieces).
GaggiX 6 hours ago |
Stockfish and lc0 would always draw if they are not put in unbalanced starting positions, the starting position will be swapped in the next game to make it fair.
KK7NIL 6 hours ago |
In classical controls (what TCEC mainly uses), yes. They can play pretty exciting bullet chess without a forced opening though.
solveit 7 hours ago |
We really have no way to know this. But I would be very surprised if modern chess engines didn't regularly blunder into losing (from the perspective of a hypothetical 32-piece tablebase) positions, and very very surprised if modern chess engines perfectly converted tablebase-winning positions.
__s 7 hours ago |
not only blunder into losing positions, but also blunder from winning positions into draws
even in human chess people sometimes mistaken draw frequency to reflect both sides playing optimally, but there are many games where a winning advantage slips away into a draw
janalsncm 6 hours ago |
The fact that TCEC games aren’t all draws suggests that computers aren’t perfect. Stockfish loses to Leela sometimes for example.
grumpopotamus 5 hours ago |
Tcec games are deliberately played from imbalanced opening positions. The draw rate would be much higher for the top participants if this wasn't forced. However, I agree that engines are not perfect. I have heard this claim many times before a new engine came along that showed just how beatable the state of the art engines still were at the time.
KK7NIL 4 hours ago |
We do know this, there are many positions (primarily sharp middle game one's) where SF/lc0 will significantly change their evaluation as they go deeper. This problem gets better the more time they spend on one position but it's an inevitable consequence of the horizon effect and it's why (except for 8 pieces or less), chess is far from solved.
__s 7 hours ago |
compared to humans yes, but between themselves in TCEC progress continues. TCEC has AIs play both sides of random openings, rather than stick to playing chess's initial position. The same happens for checkers amongst humans, where opening positions are randomized
janalsncm 6 hours ago |
Chess is not “solved”. Solved doesn’t mean computers can beat humans, it means for any chess board position we can tell whether white wins, black wins, or the game is drawn with perfect play. We would know if the starting position was drawn, for example.
No computers now or in the foreseeable future will be capable of solving chess. It has an average branching factor over 30 and games can be over 100 moves.
primitivesuave 6 hours ago |
This is accurate for endgames only. In complicated positions, there is still room for improvement - the recent game of lc0 vs stockfish where lc0 forced a draw against an impending checkmate is a good example. There is currently no way for a chess engine searching a massive game tree can see how an innocuous pawn move enables a forced stalemate 40 moves down the line.
janalsncm 6 hours ago |
The advantage of this flavor of engine is that it might make parallel position evaluation extremely efficient. Calculate 1024 leaf positions and batch them to the model, take the top 10% and explore their sub-trees either via further GPU batching or minimax eval.
NNUE already tries to distill a subtree eval into a neural net, but it’s optimized for CPU rather than GPU.
hinkley 2 hours ago |
As a game player I want to play an opponent that behaves like a human. Otherwise I’m always looking for the flaw in the design that I can exploit, which wins me the game but is less fun.
What you’re discussing sounds like intuition with checking, which is pretty close to how humans with a moderate degree of skill behave. I haven’t known enough Chess or Go masters to have any claim on how they think. But most of us don’t want an opponent at that level and if we did, we would certainly find a human, or just play against ourselves.
salamo 43 minutes ago |
The issue is that humans and computers don't evaluate board positions in the same way. A computer will analyze every possible move, and then every possible response to each of those moves, etc. Human grandmasters will typically only analyze a handful of candidate moves, and a few possible replies to those moves. This means human search is much narrower and shallower.
If you want a computer that plays like a human, you will probably need to imitate the way that a human thinks about the game. This means for example thinking about the interactions between pieces and the flow of the game rather than stateless evaluations.
dgraph_advocate 7 hours ago |
This is missing a makefile to automate the manual installation steps
ChrisArchitect 6 hours ago |
Associated discussion on the paper:
Grandmaster-Level Chess Without Search
https://news.ycombinator.com/item?id=39301944
tzs 5 hours ago |
OT: what's the state of the art in non-GM level computer chess?
Say I want to play chess with an opponent that is at about the same skill level as me, or perhaps I want to play with an opponent about 100 rating points above me for training.
Most engines let you dumb them down by cutting search depth, but that usually doesn't work well. Sure, you end up beating them about half the time if you cut the search down enough but it generally feels like they were still outplaying you for much of the game and you won because they made one or two blunders.
What I want is a computer opponent that plays at a level of my choosing but plays a game that feels like that of a typical human player of that level.
Are there such engines?
rtlaker 5 hours ago |
No, not with adjustable rating. The best human-like engine is fairymax, but its Elo is estimated between 1700-2000.
danielmarkbruce 5 hours ago |
It doesn't seem that difficult to pull off - take one of the existing engines, get the top y moves, choose randomly. For each level down increase y by 1.
agubelu 5 hours ago |
It doesn't work that way. There are many positions with lots of moves that are reasonable, but many others with only 1-2 sensible moves. It would make lots of obvious blunders that an amateur human would never make.
chmod775 5 hours ago |
Also attention. Lower level human players are more likely to make a move close to their own/their opponent's recent move. They're focused on one area of the board.
Basic computer opponents on the other hand can make moves all over the place. They look at the board state holistically. This can be very frustrating to play against as a human who has enough problems just thinking their way through some subset of the board, but is thrown off by the computer again and again.
It's not that bad in chess at least (compared to Go), but still something worth to keep in mind if you're trying to make an AI that is fun to play against as an amateur.
dullcrisp 5 hours ago |
Seems this might still have the problem of moves being either extremely good or extremely bad depending on how many good moves are found, rather than playing at a consistent level. Or for example in a degenerate case where there are only two moves and one leads to mate, the computer will be picking randomly.
bobmcnamara 5 hours ago |
Your engine would only mate once it had y options to mate.
dullcrisp 4 hours ago |
Or y chances. They did say it’d pick randomly. Still not great though, if a bit less funny.
ajkjk 5 hours ago |
No, it doesn't work at all. Human mistakes are not at all like computer mistakes. Like -- blundering a piece in a 1-2 move combination will straight up never show up in the stockfish move tree, no matter what you set `y` to.
rococode 5 hours ago |
Maia does this reasonably well! You can play against it on Lichess. I have gotten a few "feels like a human" moments when playing against it - for example, getting it to fall into a trap that could trick a human but would easily be seen by a traditional search algorithm. It's not adjustable but there are a few different versions with different ratings (although it's not a very wide range).
https://www.maiachess.com/
https://lichess.org/@/maia1
plaguuuuuu 2 hours ago |
Piggy-backing off this - does anyone know of a quick way to evaluate the maia weights from python or js for a single board state? I'm trying to hack something together with my own search func intended for human play and I can't quite figure it out from the cpp in Lc0.
og_kalu 3 hours ago |
GPT-3.5-turbo-instruct has a peak Elo of around 1800 (but of course can be prompted to play with less skill) and is as human like as you'll get at that level.
salamo 2 hours ago |
I built something like this. It works as long as you're not too high-rated: chessmate.ai. Once players get higher rated it is more difficult to predict their moves because you need to model their search process, not just their intuitive move choice. It's also possible to train on one player's games only so that it is more personalized.
It uses a similar approach to Maia but with a different neural network, so it had a bit better move matching performance. And on top of that it has an expectation maximization algorithm so that the bot will try to exploit your mistakes.
Scene_Cast2 an hour ago |
I'm currently trying to build one, fwiw.
WhitneyLand 39 minutes ago |
What’s your rating, have you tried gpt4o?
It’s supposedly good up to about 1300, but aside from that the ability to prompt can make the style of play somewhat tunable for ex aggressive, defensive, etc.
osti 4 hours ago |
https://lczero.org/blog/2024/02/how-well-do-lc0-networks-com...
The best neural network chess engine's authors wrote about this deepminds publication.
YeGoblynQueenne 4 hours ago |
Excellent work but I suggest a slightly different title:
"What would Stockfish Do?"
A more appropriate title; because Stockfish is a search-base system and DeepMind's approach wouldn't work without it.
Oh, btw, this is (yet another) a Neurosymbolic system of the "compiling system 2 to system 1" type.
jackmalpo 4 hours ago |
what i would love is an engine that thinks more like a human. presumably since this uses stockfish annotated games, it basically ends up thinking like a computer. thinking like a human would be awesome for game reviews to walk through things to note in different positions (tuned to my elo).
levocardia 37 minutes ago |
Or a model whose performance is measured via its efficiency of learning--in other words, how many games does it need to play to learn to play at X level? The reason Magnus Carlsen is impressive is because he's reached his ability level in chess under enormous time and computation constraints, compared to a computer. His efficiency of learning is extraordinary compared to that of any chess engine.
imranhou 3 hours ago |
From the page: "We also show that our model outperforms AlphaZero's policy and value networks (without MCTS) and GPT-3.5-turbo-instruct."
Why compare this to GPT-3.5-turbo-instruct? Is that near SOTA in this space?
og_kalu 3 hours ago |
As far as anyone knows, 3.5-turbo-instruct is the best chess playing (certainly it was at the time of the paper) LLM. About 1800 Elo and < 0.1% Illegal move rate. It's unclear why it was so much better than 4 (lack of RLHF?, Data?) and I don't know if anyone has bothered to test 4o similarly but it was pretty big news online at the time.
RayVR 3 hours ago |
I forget the rough adjustment factors, but it is worth noting that lichess Elo is not the same as chess.com or FIDE. I think lichess is typically ~300 points above chess.com.
This implies the model is around 2500 blitz vs humans. As blitz elo are often much higher than in classical time controls, 2500 elo on chess.com places it firmly in the 'good but not great' level.
I am very curious to know whether the model suffers from the same eval problems vs the well known "anti-bot" openings that stockfish is susceptible to at limited search depths.
gpm 3 hours ago |
> I think lichess is typically ~300 points above chess.com.
Yeah, no. They are two different rating systems (not ELO incidentally) with different curves, there isn't a fixed difference you can apply. At the high end of the scale lichess ratings are below, not above, chess.com ratings. E.g. Magnus Carlsen is 3131 blitz on lichess [0], 3294 blitz on chess.com [1].
This website [2] tries to translate between the sites, and figures that a 2925 lichess blitz rating (the closet on the website to the one reported in the paper of 2895) translates to 3000 chess.com.
[0] Multiple accounts but this is the one I found with the most blitz games: https://lichess.org/@/DrNykterstein/perf/blitz
[1] https://www.chess.com/member/magnuscarlsen
[2] https://chessgoals.com/rating-comparison/#lichesschesscom
rawsh 2 hours ago |
You can actually get solid performance with pretrained chat models: https://raw.sh/posts/chess_puzzles
On lichess puzzles gpt4o with the compiled prompt is around 70%, I think the 270M transformer is around 95%
squidgedcricket an hour ago |
Would it be feasible to create a complete lookup table of 'best' moves for all given board configurations? I'm not sure how to determine the total number of configurations. Not the same as a tablebase, just a single next move rather than sequence to checkmate.
It wouldn't be competitive against top tier players and AI, but I wouldn't be surprised if it could beat me. 'Instantly' knowing the next move would be a cool trick.
k2xl an hour ago |
The amount of data that would be required for a lookup table for all best moves for every board configuration would be infeasible.
They have managed to create one for 7 pieces. Last update on trying to get to 8 piece database: https://www.chess.com/blog/Rocky64/eight-piece-tablebases-a-...
squidgedcricket an hour ago |
Yup, and it looks like a complete tablebase from the start of the game won't ever be feasible.
> From May to August 2018 Bojun Guo generated 7-piece tables. The 7-piece tablebase contains 423,836,835,667,331 unique legal positions in about 18 Terabytes.
jeremyjh an hour ago |
There are more possible chess games than there are atoms in the universe. It can't be solved by brute force.
squidgedcricket 43 minutes ago |
There's a lot of chess configs, but there's a LOT of atoms in the observable universe. I suspect there's a few in the unobservable universe too.
Chess configs = 4.8 x 10^44, Atoms > 10^70
https://tromp.github.io/chess/chess.html https://physics.stackexchange.com/questions/47941/dumbed-dow...
You might be able to pull off a low-resolution lookup table. Take some big but manageable number N (e.g 10^10) and calculate the maximally even distribution of those points over the total space of chessboard configurations. Then make a lookup table for those configs. In play, for configs not in the table, interpolate between nearest points in the table.
roenxi 12 minutes ago |
That is basically what a neural network based chess engine is. The function the neural network is encoding is logically equivalent to "probability this move is the best for this board state".
The resolution isn't great, and adding search to that can be used to develop an implicit measure of how accurate the function is (ie, probability the move suggested in a position remains unchanged after searching the move tree for better alternatives).
Scene_Cast2 an hour ago |
If anyone is looking to get into chess neural nets, I highly recommend this repo - https://github.com/sgrvinod/chess-transformers
It uses paradigmatic PyTorch with easy to read code, and the architecture is similar to the current best performing chess neural nets.
sourcepluck 23 minutes ago |
I believe GM and chess author (and all-round lovely fellow) Matthew Sadler rigged up Leela Zero to effectively play off intuition and do very little or no search for training games. He could usually beat it, but not always. Think it might have been in The Silicon Road to Chess Improvement.