This article will teach you about the minimax algorithm and alpha-beta pruning, from a beginner's perspective.
This article aims at providing the reader with an introduction to the minimax search algorithm, and to alpha-beta pruning – an optimisation over that same algorithm.
I am writing this article as a complete beginner with regards to this topic, which hopefully will benefit you and me:
The article will introduce the theoretical concepts needed to understand the minimax algorithm, as well as the alpha-beta pruning optimisation.
I will be implementing the algorithms in Python. Feel free to join me in writing some Python code, do it in any other language you prefer, or just skip the coding parts altogether!
On top of this, we will be taking a look at these algorithms from the perspective of a game. That is, we want to implement these algorithms so that we can use them as artificial intelligence algorithms to play games against humans. Therefore, the language I will be using will also revolve around that: “players”, “moves”, “scores”, “winning/losing”, etc.
So, without further ado, let's start!
Like I just said, I am a complete beginner at this topic. The only things I have done are as follows:
If you only want to watch one, I recommend you watch this one. If you have the time and patience to watch both, then start with this one and only then watch this.
The minimax algorithm is the algorithm around which this whole article revolves, so it is best if we take some time to really understand it.
In a short, but unhelpful sentence, the minimax algorithm tries to maximise my score, while taking into account the fact that you will do your best to minimise my score.
Suppose that you and me are playing Tic Tac Toe, I'm the crosses (X
),
it's my turn to play, and the game looks like this:
I have to pick a move and, for that, I analyse all of my possibilities:
So, I play out these three moves inside my head, and I see that none of the moves gives me a win. Therefore, there is no obvious choice that I should make.
What does this mean? It means I need to keep playing out the moves inside my head, and I need to predict what you would do in each situation:
Now, this reveals something interesting! If we look at the third and fifth games of the bottom row, we see that you could win the game:
This shows that we probably don't want to go down those paths! But you and me are humans, and we are taking a look at the whole drawing all at once. Let's just play out the rest of the other alternative games to see what could happen:
Now that we played out the whole game in our heads, we need to see what choices each player will make. This is straightforward:
So, in order to evaluate all these alternatives, I have to start at the bottom. If I manage to reach any of the games in the fourth row, we will end in a draw:
Now, I need to evaluate the game positions in the third row. In other words, I need to look at the third row, and figure out how good each position is for myself. What does that depend on? It depends on how well I can do after we reach that position.
What we see is that some games in the third row will result in a draw and some in a loss for me:
Now, I need to go up one level and examine the games in the second row. Remember, all I am doing is trying to figure out what move I should make. In order to do that, I played out all possibilities in my head, and am now trying to predict how you will play.
If I want to examine the positions in the second row, now I have to think like you! Why is that? Because the arrows from the second row to the third represent your moves. What does that mean? It means that I am the one who chooses to which position of the second row we go, but you are the one who picks where to go next!
For example, if I decide to make the middle move, what would you do in response?
If I decide to make the middle move, then we can ignore the left and right parts of the sketch, and you can focus on only two positions from the third row. One of them makes me lose, the other one ends in a draw. Which one do you pick?
The one that makes me lose, obviously!
Therefore, when I examine the second row, I need to figure out what's the worst-case scenario, for me, for each position. Why? Because if we do go to that position in the second row, I know you will pick what is best for you/worst for me.
This means that the second row of the diagram ends up looking like this:
Finally, this diagram brings some clarity! Now I understand that:
Thus, the obvious choice is to go with the move on the left.
In order to practise what we are still trying to grasp about the minimax algorithm, we will implement it. Trying to implement the algorithm will push you towards understanding, so let's do it.
In order to focus on the details of the minimax algorithm, we will abstract away the context of a game, and instead will focus on the tree structure of the sketch I showed above.
Let me take the sketch above and refactor it:
I replaced all the specific drawings by circles. When it is my turn, the circle has an arrow pointing up: that's because I want to increase my score as much as possible. When it is your turn, the circle has an arrow pointing down: that's because you want to decrease my score as much as possible.
Then, I gave a score of 0 to the positions that ended in a draw and a score of -1 to the positions that ended in a loss.
What happens next is that we need to make the information flow upwards, so that I can make a move. When the information flows upwards, we need to fill in the circles:
This corresponds to:
This is the structure that we want to work with.
In order to solidify our knowledge, let's make a dummy implementation of the minimax algorithm.
To make it as simple as possible, we will implement the minimax algorithm over a tree. The tree will have nodes that branch out, and it will have terminal positions with fixed values. Our job is to implement an algorithm that traverses the tree and figures out which moves will be played out.
Here is the simple tree structure, in Python:
class Choice:
def __init__(self, left, right):
self.left = left
self.right = right
class Terminal:
def __init__(self, value):
self.value = value
With these two classes, we can build a tree like the following:
tree = Choice(
Choice(
Terminal(9),
Terminal(5),
),
Choice(
Terminal(-3),
Terminal(-2),
)
)
This corresponds to the following tree:
Now we implement our minimax algorithm. The algorithm needs to accept the tree root node of the tree structure and a Boolean flag to tell which one of us is playing.
Why is that?
Because, when I go down the tree, I want to call the function recursively. As we go down the tree, we switch back and forth between being my turn or your turn and, as we do that, we switch the objective of the algorithm:
Then, this is the structure of the algorithm:
All in all, the code looks like this:
def minimax(tree, maximising_player):
if isinstance(tree, Choice):
lv = minimax(tree.left, not maximising_player)
rv = minimax(tree.right, not maximising_player)
if maximising_player:
return max(lv, rv)
else:
return min(lv, rv)
else:
return tree.value
Notice that we use not maximising_player
when calling minimax
recursively
to switch back and forth between the players.
Now, we can call the minimax
function with the previous tree
, and see what we get.
If we start by saying that maximising_player = True
,
that means that the top of the tree has an arrow pointing up.
In that case, the result should be 5
:
If we call minimax
with maximising_player = False
,
that means that the top of the tree has an arrow pointing down.
In that case, the result should be -2
:
To check this, add the two calls to your script and run it:
print(minimax(tree, True))
print(minimax(tree, False))
The dummy minimax algorithm we implemented above worked on trees with a very specific structure. Now we will try to make it slightly more generic, by allowing tree nodes to have an arbitrary number of children.
For that, we can start by improving the classes that represent the tree structure:
class Tree:
def __init__(self, children):
self.children = children
class Terminal(Tree):
def __init__(self, value):
# A terminal state is a tree with no children:
super().__init__([])
self.value = value
Now that each tree may have multiple subtrees, we can no longer evaluate the left and right
subtrees by hand.
Instead, we need to use a for
loop to traverse all the subtrees.
We also need to distinguish between the case when we are maximising and minimising:
def minimax(tree, maximising_player):
if isinstance(tree, Terminal):
return tree.value
if maximising_player:
max_ = float("-inf")
for subtree in tree.children:
max_ = max(minimax(subtree, not maximising_player), max_)
return max_
else:
min_ = float("+inf")
for subtree in tree.children:
min_ = min(minimax(subtree, not maximising_player), min_)
return min_
However, we can see that there is some duplicated code.
What we need to do is realise that the code under the if
statement
has the exact same structure as the code under the else
,
it's just that we start with a different value and use a different function
to update the running value.
By noticing that, we can rewrite the function to look like this:
def minimax(tree, maximising_player):
if isinstance(tree, Terminal):
return tree.value
val, func = (float("-inf"), max) if maximising_player else (float("+inf"), min)
for subtree in tree.children:
val = func(minimax(subtree, not maximising_player), val)
return val
The code above uses a conditional expression to initialise
the running value and the function that we use to update the value.
The for
loop that follows has the same structure that the two loops
in the previous version of minimax
.
Personally, I would go one step further and rewrite the function as
def minimax(tree, maximising_player):
if isinstance(tree, Terminal):
return tree.value
v, f = (float("-inf"), max) if maximising_player else (float("+inf"), min)
return f((minimax(sub, not maximising_player) for sub in tree.children), default=v)
Or, now that I did, maybe I wouldn't..? If you want to study Python, try to understand why this function still works.
Now, what we need to do is test our generic minimax
function.
Let's create a tree:
tree = Tree([
Tree([
Tree([
Terminal(3),
Terminal(4),
]),
Tree([
Terminal(8),
Tree([
Terminal(-2),
Terminal(10),
]),
Terminal(5),
])
]),
Terminal(7),
])
which corresponds to this tree:
Now, take your time to walk through this tree and figure out what
should be the result of applying the minimax
algorithm
to the tree, both as the maximising player and not as the maximising player.
This is what you should get:
print(minimax(tree, True)) # 7
print(minimax(tree, False)) # 5
Now that we have a basic understanding of the minimax algorithm, let's introduce the alpha-beta pruning optimisation.
But first, why?
So far, we only applied the minimax algorithm to very small trees. For trees of this size, the algorithm has no trouble going through the whole tree. However, suppose we were playing chess.
In chess, players can generally make plenty of different moves, which means that the trees that I have been drawing would get very huge, very fast. On top of that, chess matches last for much longer than just three or four moves, which means that the trees can also get very deep.
In short, it is a lot of work to traverse the whole tree.
Having said this, it would be nice if we could optimise our minimax algorithm in some way. For example, what if we managed to avoid traversing certain parts of the tree, if we know those parts are irrelevant? That would be great!
But how could we ever realise that a certain part of the tree is irrelevant..?
Let's look at the heterogeneous tree from before, from the point of view of the maximising player:
And now that we are looking at the tree, let's try to evaluate the tree.
We will start at the bottom left, where there is a node that is maximising,
and gets to choose between a 3
and a 4
.
Obviously, that node picks the 4
:
Now it's time to evaluate the node to the right of that one, which I highlighted here:
Not only did I highlight the next node, but I also drew a little arrow
from the terminal value 8
to that highlighted node.
Think with me:
8
is connected to the highlighted node.Therefore, we know that the highlighted node will evaluate
to 8
or higher.
However, the node immediately above is a minimising node:
What can we tell, then? We already know that the minimising node will pick the path on the left! Why? Because that node wants to minimise the score, and it has two alternatives:
4
; or8
or more!If you want to keep the score as low as possible, what alternative do you pick? The first one, of course! And, in doing this, you managed to completely ignore a portion of the tree:
This is the essence of alpha-beta pruning!
The intuitive explanation of alpha-beta pruning is, well, fairly intuitive. Now, we need to turn it into something objective, so that we can translate it into code.
In the example above, why were we able to ignore a part of the tree? We managed to ignore a part of the tree because, at some point, we realised that the maximising node would result in a move that is too good for the minimising node immediately above, which already knows of a move that has a lower score.
Slow down and re-read the paragraph above, please.
To make sure you understand, here is a small tree:
The root node is seen from the maximiser's perspective, there is a terminal node on the left (represented by the question mark) and there is a subtree to the right.
In turn, that subtree is seen from the minimiser's perspective and has a terminal node on the left, whose value is 5. To its right, the remainder of the subtree is represented by an ellipsis (“...”).
Question: what value(s) could the ?
have,
which would allow us to completely ignore the “...” subtree to the right,
after we find the terminal node of value 5?
I'll help you out. The minimiser node sees a 5, so what do we know? We know that the minimiser node will evaluate to 5 or less:
Immediately above it, is a maximiser node, who already visited the ?
node.
The maximiser node has two options:
?
; orIf it happens that ?
is better than a 5 or less,
then we can ignore the “...” subtree and just go with the option on the left!
In other words, ?
can be any value larger than 5:
Was this easy? If not, re-read it carefully, grab a pen and a piece of paper, and make the drawings, the arrows, etc.
Let's try another one (bear with me, these two exercises are handy!):
In this new exercise, the root node is seen from the minimiser's perspective, and contains two children. On the left, a terminal node with an unknown value. On the right, a subtree.
In turn, the subtree is seen from the maximiser's perspective, and contains three children:
Question: what values can the subtree ?
have so that these
two restrictions apply:
Let's work this out. When we first reach the terminal 5, we know that the maximiser node will evaluate to 5 or more:
The maximiser node is 5 or more, but we don't want the algorithm to stop.
The algorithm would be able to stop if 5 were already too large
when compared to ?
because, in that case,
the minimiser node would know that going left is always better.
Therefore, ?
has to be greater than 5:
If you can't see why, imagine that the terminal node is less than 5. For example, imagine it's 4. Then convince yourself that, in that case, the algorithm could stop as soon as the 5 is hit.
Alright, so if the left terminal is greater than 5, the algorithm doesn't stop and then looks at the 7. When it does, it realises that the maximiser node will evaluate to 7 or more:
But, at this point, we wanted our algorithm to be able to stop. Therefore, the terminal node on the far left can't be that high. It had to be high enough so that the 5 wouldn't stop the algorithm, but it has to be low enough so that the 7 can stop the algorithm!
Therefore, if the left terminal has a value between 5 (exclusive) and 7 (inclusive), we satisfy the restrictions of my problem statement.
For example, if the left terminal had a value of 6, then the restrictions of the problem statement would be satisfied:
These two exercises we just did had us thinking about the algorithm in an interesting way: we were thinking about what the algorithm had to have seen earlier on, so that it could stop at a particular point in time.
Both exercises had the same focus: when evaluating a node \(n\), when can I know the node above will never pick \(n\)? From the exercises above, we figured this out:
So, for us to be able to optimise our minimax algorithm, we need to find a way to keep track of two things:
I know this is too many words, so let's focus on one case:
If a minimising node \(n\) has a value of \(v\) or lower, and the maximising node above knows a path with a value greater than or equal to \(v\), then we can stop.
Therefore, when we are inside a minimising node, we need to know what's the value of the highest node that the maximising node above has seen.
This is what we call “alpha” (\(\alpha\)), in alpha-beta pruning.
\(\alpha\) will be the best value that a maximiser node has found elsewhere. (The “best” value of a maximiser is also the highest.)
When we are inside a minimising node, we run through its children, evaluating them, and updating the value \(v\) to always be the lowest one found so far. If, at any point in time, \(v \leq \alpha\), we can stop.
We can stop because we know that the maximiser node above will have an option that is worth \(\alpha\). If the current minimiser has a value of \(v\), that can only go down (because we are in a minimising node), then we can save ourselves the trouble to trying to get \(v\) to be even lower. We will never go down that subtree, either way.
In contrast, we use “beta” (\(\beta\)) to keep track of the lowest option that a minimiser node has found so far. When a maximiser node finds its value above \(\beta\), we can stop.
The first step to implementing alpha-beta pruning is modifying
the minimax
algorithm so that it also accepts values
for alpha
and beta
,
which can have default values of \(-\infty\) and \(+\infty\), respectively:
def pruning(tree, maximising_player, alpha=float("-inf"), beta=float("+inf")):
...
Then, we need to make sure that these values are passed down the recursion calls:
def pruning(tree, maximising_player, alpha=float("-inf"), beta=float("+inf")):
if isinstance(tree, Terminal):
return tree.value
val, func = (float("-inf"), max) if maximising_player else (float("+inf"), min)
for subtree in tree.children:
val = func(pruning(subtree, not maximising_player, alpha, beta), val)
return val
But these are useless if we never really use them to prune branches of the tree. So, we can do the checks that we talked about earlier:
beta
; andalpha
.This gives us this code:
def pruning(tree, maximising_player, alpha=float("-inf"), beta=float("+inf")):
if isinstance(tree, Terminal):
return tree.value
val, func = (float("-inf"), max) if maximising_player else (float("+inf"), min)
for subtree in tree.children:
val = func(pruning(subtree, not maximising_player, alpha, beta), val)
if (maximising_player and val >= beta) or (not maximising_player and val <= alpha):
break
return val
This implements the logic to prune the branches, but it won't prune anything yet.
Why not?
Because we never update alpha
and beta
, that's why!
Updating them is similar to updating val
,
we just need to distinguish between the maximising/minimising cases:
def pruning(tree, maximising_player, alpha=float("-inf"), beta=float("+inf")):
if isinstance(tree, Terminal):
return tree.value
val, func = (float("-inf"), max) if maximising_player else (float("+inf"), min)
for subtree in tree.children:
val = func(pruning(subtree, not maximising_player, alpha, beta), val)
if maximising_player:
alpha = max(alpha, val)
else:
beta = min(beta, val)
if (maximising_player and val >= beta) or (not maximising_player and val <= alpha):
break
return val
And that's it! We can make it cleaner, but let's take it for a test drive.
First, let's make sure that it gives the correct results:
tree = Tree([
Tree([
Tree([
Terminal(3),
Terminal(4),
]),
Tree([
Terminal(8),
Tree([
Terminal(-2),
Terminal(10),
]),
Terminal(5),
])
]),
Terminal(7),
])
print(pruning(tree, True)) # 7
print(pruning(tree, False)) # 5
Now, we need to make sure that it actually prunes a couple of things. We can check that with ease.
First, implement the dunder method __str__
on the Tree
and Terminal
classes.
This allows them to be printed in a nice way:
class Tree:
def __init__(self, children):
self.children = children
def __str__(self):
return f"Tree({', '.join(str(sub) for sub in self.children)})"
class Terminal(Tree):
def __init__(self, value):
super().__init__([])
self.value = value
def __str__(self):
return f"T({self.value})"
If you need to read up on what __str__
does, take a look at this Pydon't article.
With this in place, printing the tree
above should result in this:
print(tree)
# Tree(Tree(Tree(T(3), T(4)), Tree(T(8), Tree(T(-2), T(10)), T(5))), T(7))
Now, we add a print(tree)
statement at the top of the pruning
algorithm.
This way, we can see what trees/terminals are being visited by the function:
def pruning(tree, maximising_player, alpha=float("-inf"), beta=float("+inf")):
print(tree)
...
Remember that, in the maximising case, we could avoid visiting a subtree:
For example, the terminal with value 10 should never be visited. Let's run the algorithm as the maximising player, and see what the output is:
print(pruning(tree, True))
""" Output is:
Tree(Tree(Tree(T(3), T(4)), Tree(T(8), Tree(T(-2), T(10)), T(5))), T(7))
Tree(Tree(T(3), T(4)), Tree(T(8), Tree(T(-2), T(10)), T(5)))
Tree(T(3), T(4))
T(3)
T(4)
Tree(T(8), Tree(T(-2), T(10)), T(5))
T(8)
T(7)
7
"""
In the output above, each line corresponds to a recursive call.
When the output just looks like T(?)
, that's because we reached a terminal... and, as you can see, a couple of terminals were never visited!
The terminals T(-2)
, T(10)
, and T(5)
, were never visited!
The tree Tree(T(-2), T(5))
was also never visited, as expected.
Amazing!
The algorithm is working just fine, but it doesn't look as good as it could!
Yet!
The first thing we can do is refactor the condition that allows us to break out of the loop:
def pruning(tree, maximising_player, alpha=float("-inf"), beta=float("+inf")):
...
for subtree in tree.children:
...
if maximising_player:
alpha = max(alpha, val)
else:
beta = min(beta, val)
if (maximising_player and val >= beta) or (not maximising_player and val <= alpha):
break
return val
Notice that when we are the maximising player,
alpha
keeps getting updated to get increasingly higher values,
that come from val
.
Then, we check if val >= beta
.
When we are not the maximising player,
beta
keeps getting updated to get increasingly lower values,
that come from val
.
Then, we check if val <= alpha
.
We can combine these two cases together,
and just write beta <= alpha
:
def pruning(tree, maximising_player, alpha=float("-inf"), beta=float("+inf")):
if isinstance(tree, Terminal):
return tree.value
val, func = (float("-inf"), max) if maximising_player else (float("+inf"), min)
for subtree in tree.children:
val = func(pruning(subtree, not maximising_player, alpha, beta), val)
if maximising_player:
alpha = max(alpha, val)
else:
beta = min(beta, val)
if beta <= alpha:
break
return val
By the way,
now that we have an explicit if maximising_player
statement,
we can get rid of the assignment to func
:
def pruning(tree, maximising_player, alpha=float("-inf"), beta=float("+inf")):
if isinstance(tree, Terminal):
return tree.value
val = float("-inf") if maximising_player else float("+inf")
for subtree in tree.children:
sub_val = pruning(subtree, not maximising_player, alpha, beta)
if maximising_player:
val = max(val, sub_val)
alpha = max(alpha, sub_val)
else:
val = min(val, sub_val)
beta = min(beta, sub_val)
if beta <= alpha:
break
return val
In this article, you:
We implemented alpha-beta pruning over explicit trees, but I said I wanted to use this to play games.
Therefore, I need to implement a game, and then create an abstraction that allows the alpha-beta pruning algorithm to work its magic without having to worry about the exact game that we implemented.
Then, we piece everything together!
This will be material for a couple of future articles, so subscribe to my newsletter not to miss them.
+35 chapters. +400 pages. Hundreds of examples. Over 30,000 readers!
My book “Pydon'ts” teaches you how to write elegant, expressive, and Pythonic code, to help you become a better developer. >>> Download it here 🐍🚀.