Program your own Chess Engine with Monte Carlo Tree Search

Deep Blue beats Kasparov

The Chat-GPT moment of my childhood happened in 1997, when IBM's Deep Blue supercomputer defeated the then chess grandmaster Garry Kasparov, in 3.5 out of a match of 6 games. It was perhaps the first time that a significant demonstration of AI made headlines worldwide. A real eye-opener for me in school, and apart from the Terminator films, an inspiration for a career forged in the universities of Australia and Belgium a decade or more later. Since then, AI-based chess engines have grown ever more powerful and accessible. Chess.com allows you to play chess on your phone with chess engines in the cloud that are superior to Deep Blue. To a schoolboy in the 90s, this stuff was close to magic - I had almost no idea how one would go about building such a thing - and I had no one to ask or help me with such an endeavour. The subject of this post is to help someone like my 16-year old self, who is interested in building a chess engine, but doesn't know where to start. We motivate the problem by starting with how we would naively approach the programming task, show how to create game-trees and then go on to program an MCTS-based chess engine with a few lines of code.

We build a chess engine, without a neural net (so no training data required) that can run on your laptop. This was the first game that the engine I built with MCTS played:




Alpha-Beta search

So - how did Deep Blue work? 
In fact let's back up a bit and think about how you or I play chess. We know the legal moves at each state of the board, and we want to select a move that maximizes our chances of winning the game. First, we see if there are any obvious moves that would kill an opposing piece. We then think through each possible move and our opponent's response to them. And our move to counter our opponent's move and so on. We play this movie in our head for a number of moves ahead for every possible move we can make, and the best players (presumably) can picture this movie many moves ahead.
The way we would do this with a computer program would be to store a large tree of moves in memory, with each possible move of ours spawning a number of possible opposing moves. This is called a minimax search tree, with a special technique for pruning the tree called Alpha-beta search - and this is exactly what IBM's engineers used to defeat Kasparov. 

The minimax algorithm is a decision-making process often used in game theory. It involves  a strategy for a two player zero-sum game like chess and is a Nash equilibrium. We have two players: MAX, the main player who aims to win by maximizing their own score, and MIN, the opponent who aims to win minimizing the score of MAX.

The game is typically represented as a tree or graph, where each node represents a potential state of the game. A game starts from a root (MAX) node and then branches out based on the number of possible moves for that state of the board. Each parent nodes spawns child nodes based on the allowable moves. Nodes are given a value according to a heuristic evaluation function, which estimates the "goodness" of the game state for the players. MAX and MIN layers alternate with each other till the depth of the graph. 

The goal for each player is to choose a move that maximizes (for MAX) or minimizes (for MIN) the value of the state that they can achieve (by making a move). The values are computed starting from the leaf nodes and are then propagated upwards back to the root node of the tree. This way, the root node will eventually hold the value of the best possible move that MAX can make, considering the optimal responses of MIN to each of MAX's possible moves.


Skip this next section (that expands on the alpha-beta search) and go directly to MCTS if you want the TLDR version.


Binary minimax tree for a 2-player game

Consider the above tree - a binary tree for visualization purposes but in chess, each node in the tree would have the number of possible child nodes equal to the number of legal moves. Such a tree would be evaluated to a pre-determined number of levels (depth) and a depth-first search would be carried out to propagate the values up to determine the best move that MAX (player) can make for every possible move their opponent (MIN) could make, evaluated to depth future moves. In the above tree, the root node is at level 0 and the leaf nodes at level 3. We take the max value (3) of the bottom two leaf nodes (2 and 3), bubble that up to the level 2 MAX node. And then up to the level 1 MIN node. Similarly, the max of 5 and 9 - 9 is bubbled up to level 2. But then the min of 3 and 9 is 3 and this is stored in the level 1 node. The other two level 2 nodes are similarly evaluated to 7 and 6, min of which is 6 - assigned to the second level 1 node. Max of 3 and 6 is 6, assigned to the level 0 (root) node and that is the best move that MAX can make. So the program would make the move corresponding to the 2nd child node of the root node, after evaluating moves to a depth of 3 layers of the minimax tree.


You can imagine that such a tree grows exponentially very quickly and to keep this manageable, we only evaluate the tree up to a reasonable depth and use a trick called Alpha Beta pruning to prune branches that do not add to the current estimate of the best move.


Alpha and beta are parameters that represent the current best-known upper and lower bounds of the possible results for the two players. Alpha is the score associated with the maximizing player (MAX) while beta is associated with the minimizing player (MIN).


As the search progresses, the alpha value can only increase or stay the same, as it represents the highest score the MAX player can achieve. On the other hand, the beta value can only decrease or stay the same, as it represents the lowest score the MIN player can achieve.


Let's consider alpha-beta pruning in the previous minimax game tree. 
The process begins by using depth-first search to evaluate the level 3 leaf nodes (great-grandchildren of the root node) with values of 2 and 3. The maximum value, 3, is chosen as beta for the MAX node in level 2, and backed up to the parent node at MIN level 1 (beta is set to 3). This value is further backed up to the level 0 root node (alpha is set to 3).



Next, the algorithm examines the grandchildren of another level 1 MIN node. It encounters a value of 5, which is larger than the current beta value (3). This situation is not allowed in a minimax scenario with alpha-beta pruning because a MIN node can't have a value larger than its current beta. Therefore, the branch is pruned, and no further nodes in this branch are examined.



Subsequently, the algorithm looks at other level 2 MAX nodes' children. Taking the maximum of leaf nodes (7 and 4) at level 3 and backing up to level 2 results in setting the grandparent node's beta at level 1 to 7. 

Further, considering leaf nodes 5 and 6 at level 3, the maximum value is 6, and this alpha value is backed up to level 2. It's then propagated to level 1, where the current beta is 7. Since the new backed up value, 6, is less than the current beta, beta is updated to 6.

Finally, at the root node (level 0), the maximum value of beta values of level 1 nodes (which are 3 and 6) is selected, leading to an updated alpha value of 6 at the root. This reflects the optimal value that the MAX player can achieve under the assumption of optimal play from both sides.

Final state of alpha-beta search 


IBM wrote hand-crafted evaluation functions for nodes in consultation with chess grandmasters, had databases of typical openings and endings and designed custom chess-chips that parallelized the alpha-beta minimax tree search and was able to consider 200 million possible chess positions a second.

But how do we get on with the business of writing a simple chess engine? We don't want to design a complex evaluation function and we don't want to encode special openings and endings; we just don't have the data and are interested in getting something up and running quickly. That's where Monte Carlo Tree Search or MCTS comes in.

Monte Carlo Tree Search (MCTS)
MCTS is Deep Mind's algorithm of choice - they used it in AlphaGo - famous for beating the Go grandmaster Lee Sedol in 2016 (and subsequent algorithms including AlphaZero). Go - till then was considered the ultimate frontier for AI game algorithms, because of the fact that it could not be brute-forced like chess due to the sheer combinatorial complexity of the game (game complexity is a googol - 10 raised to the power of 100 - times more than chess).

MCTS is a heuristic search algorithm; which also builds up a tree of moves to arrive at the best possible move at each state of the game. The Monte Carlo refers to the fact that random moves are used to expand a node (in general, the term Monte Carlo is used to describe statistical methods that use the chance moves in the real casino). The tree is expanded from the current game state, simulating a potential game till its termination. This simulation (also called a rollout), once it ends in a win/loss, is used to propagate statistics up all the nodes back along the traversed path. It comprises of 4 discrete steps:

1. Selection: Starting at the root node (representing the current game state), a path is traced down the tree according to a policy, often the Upper Confidence Bound for Trees (UCT) policy. This policy aims to balance exploitation (picking the move that currently seems best) and exploration (picking a move that might turn out to be good after more testing). 


2. Expansion: When the selection reaches a leaf node (a node that doesn't have all possible moves simulated), a child nodes is created (at random) out of the allowed moves at that board state.


3. Simulation (or Rollout): A game is simulated from the newly expanded node's state to a terminal state using a rollout policy, often making random moves or following some lightweight strategy (or policy). For the purposes of this post, we will use a random rollout.


4. Backpropagation: Once the simulation ends, the outcome (win, loss, or draw) is propagated back up the tree, updating the statistics of all nodes along the traversed path. Each node maintains a record of the total number of simulations passing through it and the number of wins associated with it.


The Monte Carlo Tree Search (MCTS) algorithm performs a number of rollouts (simulations) that is specified by the "iterations" parameter in the MCTS function (code given below). By default, this parameter is set to 1000, meaning that unless a different number is specified when calling the function, the algorithm will perform 1000 rollouts per move.

Here's how it works:

1. The function starts by creating a root node from the current game board.

2. It then enters a loop that runs for a number of rollouts (I used 1000 rollouts that took an average of 5-10 seconds on my 2021 macbook). For each iteration, it does the following:

    a. Selects a node from the tree using the UCT (Upper Confidence Bound applied to Trees) formula, which balances exploitation and exploration in the tree.

    b. If the selected node is not terminal (i.e., the game is not over at this node) and is not fully expanded, it expands the tree by creating a child node for a random legal move.

    c. It simulates a random playout from the newly expanded node until the game is over.

   d. It then backpropagates the result of the simulation through the tree, updating the visit and win counts of the nodes along the way.

3. After all the rollouts are completed, it selects the move associated with the child of the root that was visited the most times as the best move.

I used the python chess library to set up the game and get allowable moves, etc. and here is the MCTS code for the chess engine:



So there, you have it - a full chess engine in just a few lines of code. I leave the conversion of individual board states to chess images as an exercise for the reader. The first game I got MCTS to play against itself - it check-mated (itself?) in 278 moves.
The engine can be further improved by combining MCTS with a neural net. And this is what Deepmind did with AlphaGo/Zero. The neural net would be used to select moves instead of performing a random rollout, which would make the search process more directed and efficient. The neural net's policy (given state of the game, predict move or a distribution over moves) could be trained from online datasets of games. Alternatively, by having the current version of MCTS play against itself for many games one could build up a database of games for training the network policy.



Comments

Popular posts from this blog

SE2 & Forward Kinematics

Unlocking Robotic Grasping: The Fusion of Perception, Mathematics and Control

Teaching Robots to Do the Dishes: A Data-Driven Approach