What is Monte-Carlo Tree Search (MCTS)?

MCTS is a probabilistic algorithm for sequential decision-making that makes use of lightweight and repeated random simulations.

This versatile algorithm does not necessitate previous knowledge of the problem and proved to be stable when dealing with high-dimensional problems. With GO and HEX, it performs better than any other algorithms.

In MCTS, the problems are represented as tree structures as depicted in the figure below:

Schema of the Monte Carlo tree search
Schema of the Monte Carlo tree search


As the algorithm progresses, it builds up an asymmetric tree in memory and becomes more accurate at estimating the value of the most promising moves.

The four core steps of the MCTS

Within the allotted time, the MTCS algorithm repeatedly performs the following four steps:

  • Selection: The MCTS tree is traversed starting from the root to the target leaf using some criterion (known as tree policy) in order to select the best-fit child for exploration. Here two operations come into play:

    • Exploitation: This task consists of selecting the move that leads to the best results so far.

    • Exploration: Considering the uncertainty of the evaluation, the less promising moves are still to be tried.

  • Expansion: After the selection process halts at a leaf node, the node is expanded and a new node is added as a child.

  • Simulation: A pseudo-random simulation is run from the selected node
    until some final state is reached (a state that has no successors). During the
    simulation, the actions are selected by a simulation policy. This policy may be either weak or strong. A strong policy takes a more guided approach to decision-making while a weak policy has little to no planned strategy.

  • Backpropagation: The results from the simulation are updated for the new leaf node and all its parents, all the way up to the root. This step ascertains that the values of each node appropriately reflect the simulations carried out in the subtrees they represent.

These four steps are iteratively applied until a decision is made. The following figure exhibits these steps:

The four core steps of the MCTS
The four core steps of the MCTS


Benefits of MCTS

  • It doesn't require any knowledge of the given domain to make reasonable decisions.

  • It performs asymmetric tree growth to accommodate the space's topology. With this in mind, MCTS is ideally suited for games that have a lot of branching.

  • At any time, the algorithm can be stopped to return the current best estimate.

  • The algorithm is relatively easy to implement.

Drawbacks of MCTS

  • For a large combinatorial move space, the MCTS algorithm, in its basic form, can fail to find reasonable moves in a reasonable time.

  • Searching and converging to a good result with MCTS can take many iterations, which can be an issue for applications that rely on speed.

Coding example

This example illustrates an implementation of the Monte-Carlo Tree Search in Python:

import numpy as np
from collections import defaultdict
class MCTS():
def __init__(self, current_node, parent_node=None, parent_node_action=None):
#The current node
self.current_node = current_node
#The parent_node (The parent_node of the root = none)
self.parent_node = parent_node
#The current node siblings
self.child_nodes = []
self.parent_node_action = parent_node_action
#The count of visits of the current node.
self._count_of_visits = 0
self._results = defaultdict(int)
self._results[1] = 0
self._results[-1] = 0
#The list of all possible transitions
self._unattempted_transitions = None
self._unattempted_transitions = self.unattempted_transitions()
def unattempted_transitions(self):
#Returns the list of unattempted transitions from a given node.
self._unattempted_transitions = self.current_node.get_allowed_transitions()
return self._unattempted_transitions
def q(self):
#Return the difference of wins - losses
wins = self._results[1]
loses = self._results[-1]
return wins - loses
def n(self):
#Return the number of times each node was visited.
return self._count_of_visits
def expand(self):
#From the present current_node
# Determine the next node depending on the carried out action.
# Determine the possible child nodes and append them to the child_nodes.
action = self._unattempted_transitions.pop()
next_node = self.current_node.move(action)
child_node = MonteCarloTreeSearchNode(
next_node
, parent_node=self
, parent_node_action=action
)
self.child_nodes.append(child_node)
return child_node
def is_terminal_node(self):
#Check whether the current node is the last tree node.
return self.current_node.is_last_tree_node()
def run_simulation(self):
#From the current current_node a simulation is performed:
current_simulation = self.current_node
while not current_simulation.is_last_tree_node():
possible_moves = current_simulation.get_allowed_transitions()
action = self.simulation_policy(possible_moves)
current_simulation = current_simulation.move(action)
return current_simulation.game_result()
def backpropagate_results(self, result):
#The statistics collected for the nodes are updated all way up.
# The number of visits is incremented by 1 for each node visited.
# The corresponding result is incremented by 1 depending on the result parameter.
self._count_of_visits += 1
self._results[result] += 1
if self.parent_node:
self.parent_node.backpropagate_results(result)
def is_fully_expanded(self):
#Verify that all the transitions were poped out of the _unattempted_transitions one by one.
#When empty (size = zero) -> it is fully expanded.
return len(self._unattempted_transitions) == 0
def best_child(self, c_param=0.1):
#Once fully expanded, this function selects the best child out of the child_nodes array.
choices_weights = [(c.q() / c.n()) + c_param * np.sqrt((2 * np.log(self.n()) / c.n())) for c in self.child_nodes]
return self.child_nodes[np.argmax(choices_weights)]
def simulation_policy(self, possible_moves):
#Select a move randomly out of possible moves.
return possible_moves[np.random.randint(len(possible_moves))]
def _tree_policy(self):
#Iterate over nodes
current_node = self
while not current_node.is_terminal_node():
if not current_node.is_fully_expanded():
return current_node.expand()
else:
current_node = current_node.best_child()
return current_node
def best_action(self,no_simulation):
#This function returns the node corresponding to best possible move.
for i in range(no_simulation):
v = self._tree_policy()
reward = v.run_simulation()
v.backpropagate_results(reward)
return self.best_child(c_param=0.1)
def main():
no_simulation = 100
root = MCTS(current_node = initial_current_node)
selected_node = root.best_action(no_simulation)

Let's go through the explanation of the code widget above:

  • Lines 1–2: We import the required Python libraries.

  • Lines 27–30: We get the list of unattempted transitions.

  • Lines 32–36: We get the node result.

  • Lines 38–40: We get the number of respective node visits.

  • Lines 42–54: For a particular node, we determine the next node depending on the carried-out action and figure out the possible child nodes of this node and append them to the child_nodes array.

  • Lines 56–58: We check whether the current node is the last tree node.

  • Lines 60–68: Departing from the current node, we run a simulation and calculate the outcome based on a simulation policy.

  • Lines 70–77: We backpropagate results and update the visited nodes statistics. Increment by 1 the number of visits for each node visited.

  • Lines 79–82: We verify that all the transitions were popped out and the tree is fully expanded.

  • Lines 79–82: We select the best child out of the child_nodes array.

  • Lines 89–91: We randomly select a move.

  • Lines 93–101: We iterate over the selected node siblings.

  • Lines 103–109: We carry out the steps of expansion, simulation, and backpropagation and return the best possible move.


Free Resources

Copyright ©2025 Educative, Inc. All rights reserved