What is Monte-Carlo Tree Search (MCTS)?

As the algorithm progresses, it builds up an asymmetric tree in memory and becomes more accurate at estimating the value of the most promising moves.

The four core steps of the MCTS

Within the allotted time, the MTCS algorithm repeatedly performs the following four steps:

Selection: The MCTS tree is traversed starting from the root to the target leaf using some criterion (known as tree policy) in order to select the best-fit child for exploration. Here two operations come into play:
- Exploitation: This task consists of selecting the move that leads to the best results so far.
- Exploration: Considering the uncertainty of the evaluation, the less promising moves are still to be tried.
Expansion: After the selection process halts at a leaf node, the node is expanded and a new node is added as a child.
Simulation: A pseudo-random simulation is run from the selected node
until some final state is reached (a state that has no successors). During the
simulation, the actions are selected by a simulation policy. This policy may be either weak or strong. A strong policy takes a more guided approach to decision-making while a weak policy has little to no planned strategy.
Backpropagation: The results from the simulation are updated for the new leaf node and all its parents, all the way up to the root. This step ascertains that the values of each node appropriately reflect the simulations carried out in the subtrees they represent.

These four steps are iteratively applied until a decision is made. The following figure exhibits these steps:

Benefits of MCTS

It doesn't require any knowledge of the given domain to make reasonable decisions.
It performs asymmetric tree growth to accommodate the space's topology. With this in mind, MCTS is ideally suited for games that have a lot of branching.
At any time, the algorithm can be stopped to return the current best estimate.
The algorithm is relatively easy to implement.

Drawbacks of MCTS

For a large combinatorial move space, the MCTS algorithm, in its basic form, can fail to find reasonable moves in a reasonable time.
Searching and converging to a good result with MCTS can take many iterations, which can be an issue for applications that rely on speed.

Coding example

This example illustrates an implementation of the Monte-Carlo Tree Search in Python:

import numpy as np
from collections import defaultdict
class MCTS():
    def __init__(self, current_node, parent_node=None, parent_node_action=None):
        #The current node
        self.current_node = current_node   
		
        #The parent_node (The parent_node of the root = none) 
        self.parent_node = parent_node 
		
        #The current node siblings
        self.child_nodes = []
        self.parent_node_action = parent_node_action
		
        #The count of visits of the current node.
        self._count_of_visits = 0
		
        self._results = defaultdict(int)
        self._results[1]  = 0
        self._results[-1] = 0
        
		#The list of all possible transitions
        self._unattempted_transitions = None
        self._unattempted_transitions = self.unattempted_transitions()
    def unattempted_transitions(self):
        #Returns the list of unattempted transitions from a given node.
        self._unattempted_transitions = self.current_node.get_allowed_transitions()
        return self._unattempted_transitions
    def q(self):
        #Return the difference of wins - losses
        wins  = self._results[1]
        loses = self._results[-1]
        return wins - loses
    def n(self):
        #Return the number of times each node was visited.
        return self._count_of_visits
    def expand(self):
        #From the present current_node 
		# Determine the next node depending on the carried out action. 
		# Determine the possible child nodes and append them to the child_nodes.
        action     = self._unattempted_transitions.pop()
        next_node  = self.current_node.move(action)
        child_node = MonteCarloTreeSearchNode(
                        	                   next_node
                                             , parent_node=self
                                             , parent_node_action=action
                                             )
        self.child_nodes.append(child_node)
        return child_node     
    def is_terminal_node(self):
        #Check whether the current node is the last tree node.
        return self.current_node.is_last_tree_node()
    def run_simulation(self):
        #From the current current_node a simulation is performed: 
	
        current_simulation = self.current_node
        while not current_simulation.is_last_tree_node():
              possible_moves = current_simulation.get_allowed_transitions()
              action = self.simulation_policy(possible_moves)
              current_simulation = current_simulation.move(action)
        return current_simulation.game_result()
    def backpropagate_results(self, result):
        #The statistics collected for the nodes are updated all way up.
        # The number of visits is incremented by 1 for each node visited.  
		# The corresponding result is incremented by 1 depending on the result parameter.  
        self._count_of_visits += 1
        self._results[result] += 1
        if self.parent_node:
           self.parent_node.backpropagate_results(result)    
    def is_fully_expanded(self):
        #Verify that all the transitions were poped out of the _unattempted_transitions one by one. 
        #When empty (size = zero) -> it is fully expanded.
        return len(self._unattempted_transitions) == 0
    def best_child(self, c_param=0.1):
        #Once fully expanded, this function selects the best child out of the child_nodes array. 
        choices_weights = [(c.q() / c.n()) + c_param * np.sqrt((2 * np.log(self.n()) / c.n())) for c in self.child_nodes]
        return self.child_nodes[np.argmax(choices_weights)]    
    def simulation_policy(self, possible_moves):
        #Select a move randomly out of possible moves. 
        return possible_moves[np.random.randint(len(possible_moves))]
    def _tree_policy(self):
		#Iterate over nodes	
        current_node = self
        while not current_node.is_terminal_node():
            if not current_node.is_fully_expanded():
               return current_node.expand()
            else:
               current_node = current_node.best_child()
        return current_node
    def best_action(self,no_simulation):
        #This function returns the node corresponding to best possible move. 
        for i in range(no_simulation):
            v = self._tree_policy()
            reward = v.run_simulation()
            v.backpropagate_results(reward)
        return self.best_child(c_param=0.1)
def main():
    no_simulation = 100 
    root = MCTS(current_node = initial_current_node)
    selected_node = root.best_action(no_simulation)

Let's go through the explanation of the code widget above:

Lines 1–2: We import the required Python libraries.
Lines 27–30: We get the list of unattempted transitions.
Lines 32–36: We get the node result.
Lines 38–40: We get the number of respective node visits.
Lines 42–54: For a particular node, we determine the next node depending on the carried-out action and figure out the possible child nodes of this node and append them to the child_nodes array.
Lines 56–58: We check whether the current node is the last tree node.
Lines 60–68: Departing from the current node, we run a simulation and calculate the outcome based on a simulation policy.
Lines 70–77: We backpropagate results and update the visited nodes statistics. Increment by 1 the number of visits for each node visited.
Lines 79–82: We verify that all the transitions were popped out and the tree is fully expanded.
Lines 79–82: We select the best child out of the child_nodes array.
Lines 89–91: We randomly select a move.
Lines 93–101: We iterate over the selected node siblings.
Lines 103–109: We carry out the steps of expansion, simulation, and backpropagation and return the best possible move.

Free AI Mock Interviews

Coding Interview

Coding PatternsFree Interview

Gain insights and practical experience with coding patterns through targeted MCQs and coding problems, designed to match and challenge your expertise level.

System Design

YouTubeFree Interview

Learn to design a video streaming platform like YouTube by tackling functional and non-functional requirements, core components, and high-level to detailed design challenges.

Free Resources