In my previous postHow To Win Sudoku, we learned how to teach computers to solve the puzzle Sudoku. If you haven’t read it, go ahead and give it a quick read. But that was really just a way to get our feet wet, before diving into more sophisticated methods of game playing agents. Especially those methods that can make strategic moves against an opponent!
In this lesson, we’ll explore a popular algorithm calledminimax.We’ll also learn some of its friendly neighborhood add-on features likeheuristic scores,iterative deepening,andalpha-beta pruning.Using these techniques, we can create a more flexible and powerful game playing agent. It’ll be able to compete in many challenges, including the strategy game Isolation.
Don’t Get Stranded
Isolation (or Isola) is a turn-based strategy board game where two players try to confine their opponent on a 7x7 checker-like board. Eventually, they can no longer make a move (thus isolating them).
Each player has one piece, which they can move around like a queen in chess — up-down, left-right, and diagonal. There are three conditions under which the pieces can be moved —
They cannot place their piece on an already visited square.
They cannot cross over already visited squares (squeezing through them diagonally is OK).
They cannot cross over each other’s piece.
In the above picture, you can see from the black squares that both players have placed their pieces on various parts of the board. But as the game progressed, it shows that the yellow player still has three possible moves (up and to the right, right one square, and right two squares). But the blue player is out of options. Hence, the yellow player is the winner here.
Now this may seem like a fairly simple game — and to be honest, it is. It’s not like we’re playingpokerorStarcraft. Yet, there is still an enormous amount of possible moves either player can make at any point during the game.
In puzzles like Sudoku, there’s an “answer” which we want to solve for. But there is no answer when it comes to strategy games.
We’re playing against another opponent — like a person, computer, or acat detective. This requires strategy and some thought into how the game may turn out as it rolls along.
Such games can evolve and produce an absurd amount of possible outcomes. So we need to think of how we can choose the best possible move, without spending the amount of time it took for cats to populate the Earth.
Okay, no more cats!
Mighty Minimax And Friends
Now that you know how to play Isolation, let’s take a look at how we can use theminimaxalgorithm — a staple in the AI community. We’ll also look atheuristic scores,iterative deepening, andalpha-beta pruning.Together with these, we can build a competitive AI agent.
As mentioned, theminimaxalgorithm is very popular for teaching AI agents how to play turn-based strategy games. The reason being that, it takes into account all next possible moves both players can take at any given time during the game. With this information, it then attempts to minimize the opponent player’s advantage while maximizing the agent’s, at every turn the AI agent gets to play.
Now, how does this look? Well, similar to how an AI agent would play a game like Sudoku, we can model the next possible moves either player can make via asearch tree. However, we’ll need to use a search tree with variable breadths — or in other words, a tree level’s width. The reason being that, there are a variable number of moves each player can make at any given time during the game.
The tree shown above represents the next moves available during a game of Isolation. It has a 2x3 grid, with the bottom right square being unreachable. As you can see, the two players are a blue circle and a red cross. The top of the tree (the root node) illustrates a move made by the red player. The middle level illustrates the next possible moves by the blue player. And the third level illustrates the possible moves by the red player, given the previous move made by the blue player.
Now you might be wondering, what the heck are those triangles below each move?
The downward triangle represents a location in the tree where minimax willminimizethe opponent’s advantage. Whereas, the upward triangles are the locations where minimaxmaximizesthe agent’s advantage.
However, minimax can only know either players’ advantage if it knows the paths in the tree that lead to a victory for either player. This means minimax must traverse to the very bottom of the tree for every possible series of moves. Next, it has to assign some score (e.g., +1 for a win and -1 for a loss), and propagate those numbers up through the tree. This way, each game state or node in the tree has information about which player has the most to gain from any potential move.
In this picture, we can make a couple of observations. First minimax assigns a number to the final game outcomes at theleaf nodes. Then it propagates them upward through the tree, performing minimizations and maximizations on the way. Once minimax finishes filling in the tree, whenever it’s the AI agent’s turn, it’ll know which moves will likely lead to a win or loss.
The second level after the root node shows the next possible moves for the blue player (our AI agent). Our agent wants to maximize the available scores during its turn. So it would choose the move represented in the right-most node following the root node. Super cool!
But does it make sense to simply assign a +1 or -1 to game outcomes? Shouldn’t this score take into consideration how the game is won or lost? Spoiler alert: the answer is yes!
In the world of strategy games, a heuristic score is essentially a subjective value we assign to some game state. This value is based on our understanding of how the game is won and lost. By choosing a well-thought-out heuristic score, we’re able to teach our AI agent how to best select its next moves while playing the game Isolation.
Now, there’s probably an unlimited number of heuristic scores we could come up with. But here we’ll only look at a couple of them, apart from thenaïve score (NS)of +1 and -1.
One idea could be to count all the next possible moves either player has at any given time — since more possible moves mean less chance of being isolated. We’ll call this theopen move score (OMS). Another idea could be to use the value obtained from OMS and subtracting the number of next possible moves the opponent has. The reason being, each player wants to both increase their amount of moves while decreasing their opponent’s. We’ll call this theimproved score (IS).
The above figure shows the win rates over many simulated isolation games played between AI agents, using different heuristic scores. Now, you can see how different our scores did during actual gameplay. However, there were some heuristic scores that outperformed the ones we came up with.
Interestingly enough, the top two are almost exactly the same as improved score. We’ll call themaggressive improved score (AIS)andsuper aggressive improved score (SAIS). But there’s a slight difference between these scores and the original. The top two scores apply a factor of two and three respectively, to the value you subtract with (the number of moves available to the opponent), when computing the improved score.
Maybe you can discover an optimal “aggressive factor” to apply when computing this score! Another spoiler alert — better values exist.
But what if we come up with a heuristic score that takes a lot of time to compute? What if the tree is huge? Will our AI agent have enough time to find its next best moves, while still being responsive enough during gameplay?
Now, we know that our AI agent can model all possible moves using a search tree and its nodes’ corresponding heuristic score. But unfortunately, when playing Isolation our tree will be massive. Like, really massive. It would take more time to search the tree and compute these values than there are years since the big bang!
OK, if you have a supercomputer, maybe this is an exaggeration. But we don’t all have supercomputers lying around (yet).
Enteriterative deepening — the go-to time management strategy for game playing agents. By using this method, we can cut the compute and search time down to a maximum time of our choosing (like 2 seconds, for example). This way our AI agent can respond at least as fast as a human could.
But how does iterative deepening work? Simply put, it allows minimax to move level by level and compute heuristic scores until a certain time limit. Once this time limit is reached, the AI agent is forced to use the best move it discovered while having moved deeper and deeper down the tree.
Now this provides some insight into how difficult it can be . To create an AI agent that is smart and responsive enough for strategy games, can be quite tricky — even for AI wizards. Especially if such games contain a world of possibilities.
Unfortunately, the number of moves the AI agent can “imagine” going forward is limited. So it’s possible it could make a decision that ultimately leads to its demise. This is a well-known phenomenon called thehorizon effect. But we still need to look at arguably the most effective time slashing algorithm used when searching trees.
OK, those are raisins and not prunes, but still — how were these ever a thing? I mean, seriously, a raisin blues group?
You might have already guessedalpha-beta pruninghas nothing to do with prunes, and more about reducing the size (pruning) of our search tree. When we have a very large search tree, it turns out that it’s not always necessary to traverse every node when using minimax.
We need to give minimax the ability to stop searching a particular region of the tree, when it finds the guaranteed minimum or maximum of that particular level.
If we can do that, this can greatly reduce our AI agent’s response time and improve overall performance.
How does Alpha-beta pruning work?
The minimax algorithm moves through the tree usingdepth-first search. Meaning, it traverses through the tree going from left to right, and always going the deepest it can go. It then discovers values that logically must be assigned to nodes directly above it, without ever looking at other branches of the tree.
Alpha-beta pruning allows minimax to make just as good of decisions as minimax could do alone, but with a higher level of performance.
Consider the following image, in which we have a tree with various scores assigned to each node. Some nodes are shaded in red, indicating there’s no need to review them.
At the bottom left of the tree, minimax looks at the values 5 and6on the bottommaxlevel, and determines that5must be assigned to theminlevel right above it. Makes sense. However, after looking at7and4of the rightmax level branch, it realizes that the aboveminlevel node must be assigned a maximum value of4. Since the secondmaxlevel right above the firstminlevel will take the maximum between5and at most4, it’s clear that it’ll choose5. Following this, it would continue traversing the tree to perform the same exact set of operations within the tree’s other branches.
Below is the algorithmic representation of minimax with alpha-beta pruning.
Using this method provides an easy way to cut down on our AI agent’s search space. This way, alpha-beta pruning allows minimax to make just as good of decisions as minimax could do alone, but with a higher level of performance.
We’ve explored how to build our own AI agent that can play the game Isolation at a fairly competitive level. By using the minmax algorithm, we saw how the AI agent can model the game and can make decisions based on a heuristic score. We learnt how to determine a well-defined heuristic for our given task (Isolation).
But we also discovered that it would be far too computationally intense to let minimax just run wild. So we had to use techniques like iterative deepening and alpha-beta pruning. These would force our AI agent to come up with the next move within a reasonable amount of time. But what if we want our AI agent to have a higher win rate while at least being as responsive as a human?
Well, there are other techniques we could explore to increase our agent’s win rate as well as response time. We touched upon the idea of tweaking our heuristic score’s parameters (remember the “aggressive factor”?) We could even come up with a heuristic score better tailored for playing Isolation.
There are also other reflective properties related to the possible moves on the Isolation board. These become evident when we analyze the fully populated search tree. We could potentially slash a lot of branches from the search tree. Also, if we upgraded our hardware, our AI agent would be faster — and thus be able to explore more possibilities.