It spent the first few minutes analyzing the image and cross-checking various slices of the image to make sure it understood the problem. Then it spent the next 6-7 minutes trying to work through various angles to the problem analytically. It decided this was likely a mate-in-two (part of the training data?), but went down the path that the key to solving the problem would be to convert the position to something more easily solvable first. At that point it started trying to pip install all sorts of chess-related packages, and when it couldn’t get that to work it started writing a simple chess solver in Python by hand (which didn’t work either). At one point it thought the script had found a mate-in-six that turned out to be due to a script bug, but I found it impressive that it didn’t just trust the script’s output - instead it analyzed the proposed solution and determined the nature of the bug in the script that caused it. Then it gave up and tried analyzing a bit more for five more minutes, at which point the thinking got cut off and displayed an internal error.
15 minutes total, didn’t solve the problem, but fascinating! There were several points where if the model were more “intelligent”, I absolutely could see it reasoning it out following the same steps.
The fact that gpt-4.5 gets 85% correctly solved is unexpected and somewhat scary (if model was not trained on this).
That means, it can literally never win a chess match, given an intentional illegal move is an immediate loss.
It can't beat a human who can't play chess. It literally can't even lose properly. It will disqualify itself every time.
--
> It shows clearly where current models shine (problem-solving)
Yeh - that's not what's happening.
I say that as someone that pays for and uses an LLM pretty much every day.
--
Also - I didn't fact check any of the above about playing chess. I choose to believe.
- Check obvious, wrong moves.
- Ask what I need to have to win the game even if there's just black king left. Answer is I need all 3 pieces to win some day even if there's just black king on the board.
- So any moves that makes me lose my pawn or rook result in failure.
- So the only thing I can do with the rook is move it vertically. Any horizontal move allows black to take my pawn. King and pawn don't have much options and all result in pawn loss or basically skipping a turn while changing situation a little bit for the worse that makes mate in one move unlikely.
- Taking a pawn with rook results in loss of the rook which is just as bad.
- Let's look at spot next to the pawn. I'll still protect my pawn, but my rook is in danger. But if black takes rook, I can just move my pawn forward to get a mate. If they don't I can move rook forward and get a mate. Solved.
So I skipped trying to run a program and googling part, not because it didn't came to my mind but because I wanted different kind of challenge then challenge of extracting information from the internet or challenge of running a unfamiliar piece of software.
I've never met a human player that suddenly says 'OK, I need Python to figure out my next move'.
I'm not a good player, usually I just do ten minute matches against the weakest Stockfish settings so as not to be annoying to a human, and I figured this one out in a couple of minutes because there are very few options. Taking with the rook doesn't work, taking with the pawn also doesn't, so it has to be a non-taking move, and the king can't do anything useful so it has to be the rook and typically in these puzzles it's a sacrifice that unlocks the solution. And it was.
At no point during my process would I be counting pixels in the image. It feels very clearly like a machine that mimics human behavior without understanding where that behavior comes from.
But I wasn't thinking in text, I was thinking graphically. I was visualizing the board. It's not beyond the realm of possibility that you can tokenize graphics. When is that coming?
> Chess Puzzle Checkmate in 2 White
does it mean we are white, or does it mean we're trying to checkmate white?
Claude reigns supreme.