MCTS is a probabilistic algorithm for sequential decision-making that makes use of lightweight and repeated random simulations.
This versatile algorithm does not necessitate previous knowledge of the problem and proved to be stable when dealing with high-dimensional problems. With GO and HEX, it performs better than any other algorithms.
In MCTS, the problems are represented as tree structures as depicted in the figure below:
As the algorithm progresses, it builds up an asymmetric tree in memory and becomes more accurate at estimating the value of the most promising moves.
Within the allotted time, the MTCS algorithm repeatedly performs the following four steps:
Selection: The MCTS tree is traversed starting from the root to the target leaf using some criterion (known as tree policy) in order to select the best-fit child for exploration. Here two operations come into play:
Exploitation: This task consists of selecting the move that leads to the best results so far.
Exploration: Considering the uncertainty of the evaluation, the less promising moves are still to be tried.
Expansion: After the selection process halts at a leaf node, the node is expanded and a new node is added as a child.
Simulation: A pseudo-random simulation is run from the selected node
until some final state is reached (a state that has no successors). During the
simulation, the actions are selected by a simulation policy. This policy may be either weak or strong. A strong policy takes a more guided approach to decision-making while a weak policy has little to no planned strategy.
Backpropagation: The results from the simulation are updated for the new leaf node and all its parents, all the way up to the root. This step ascertains that the values of each node appropriately reflect the simulations carried out in the subtrees they represent.
These four steps are iteratively applied until a decision is made. The following figure exhibits these steps:
It doesn't require any knowledge of the given domain to make reasonable decisions.
It performs asymmetric tree growth to accommodate the space's topology. With this in mind, MCTS is ideally suited for games that have a lot of branching.
At any time, the algorithm can be stopped to return the current best estimate.
The algorithm is relatively easy to implement.
For a large combinatorial move space, the MCTS algorithm, in its basic form, can fail to find reasonable moves in a reasonable time.
Searching and converging to a good result with MCTS can take many iterations, which can be an issue for applications that rely on speed.
This example illustrates an implementation of the Monte-Carlo Tree Search in Python:
import numpy as npfrom collections import defaultdictclass MCTS():def __init__(self, current_node, parent_node=None, parent_node_action=None):#The current nodeself.current_node = current_node#The parent_node (The parent_node of the root = none)self.parent_node = parent_node#The current node siblingsself.child_nodes = []self.parent_node_action = parent_node_action#The count of visits of the current node.self._count_of_visits = 0self._results = defaultdict(int)self._results[1] = 0self._results[-1] = 0#The list of all possible transitionsself._unattempted_transitions = Noneself._unattempted_transitions = self.unattempted_transitions()def unattempted_transitions(self):#Returns the list of unattempted transitions from a given node.self._unattempted_transitions = self.current_node.get_allowed_transitions()return self._unattempted_transitionsdef q(self):#Return the difference of wins - losseswins = self._results[1]loses = self._results[-1]return wins - losesdef n(self):#Return the number of times each node was visited.return self._count_of_visitsdef expand(self):#From the present current_node# Determine the next node depending on the carried out action.# Determine the possible child nodes and append them to the child_nodes.action = self._unattempted_transitions.pop()next_node = self.current_node.move(action)child_node = MonteCarloTreeSearchNode(next_node, parent_node=self, parent_node_action=action)self.child_nodes.append(child_node)return child_nodedef is_terminal_node(self):#Check whether the current node is the last tree node.return self.current_node.is_last_tree_node()def run_simulation(self):#From the current current_node a simulation is performed:current_simulation = self.current_nodewhile not current_simulation.is_last_tree_node():possible_moves = current_simulation.get_allowed_transitions()action = self.simulation_policy(possible_moves)current_simulation = current_simulation.move(action)return current_simulation.game_result()def backpropagate_results(self, result):#The statistics collected for the nodes are updated all way up.# The number of visits is incremented by 1 for each node visited.# The corresponding result is incremented by 1 depending on the result parameter.self._count_of_visits += 1self._results[result] += 1if self.parent_node:self.parent_node.backpropagate_results(result)def is_fully_expanded(self):#Verify that all the transitions were poped out of the _unattempted_transitions one by one.#When empty (size = zero) -> it is fully expanded.return len(self._unattempted_transitions) == 0def best_child(self, c_param=0.1):#Once fully expanded, this function selects the best child out of the child_nodes array.choices_weights = [(c.q() / c.n()) + c_param * np.sqrt((2 * np.log(self.n()) / c.n())) for c in self.child_nodes]return self.child_nodes[np.argmax(choices_weights)]def simulation_policy(self, possible_moves):#Select a move randomly out of possible moves.return possible_moves[np.random.randint(len(possible_moves))]def _tree_policy(self):#Iterate over nodescurrent_node = selfwhile not current_node.is_terminal_node():if not current_node.is_fully_expanded():return current_node.expand()else:current_node = current_node.best_child()return current_nodedef best_action(self,no_simulation):#This function returns the node corresponding to best possible move.for i in range(no_simulation):v = self._tree_policy()reward = v.run_simulation()v.backpropagate_results(reward)return self.best_child(c_param=0.1)def main():no_simulation = 100root = MCTS(current_node = initial_current_node)selected_node = root.best_action(no_simulation)
Let's go through the explanation of the code widget above:
Lines 1–2: We import the required Python libraries.
Lines 27–30: We get the list of unattempted transitions.
Lines 32–36: We get the node result.
Lines 38–40: We get the number of respective node visits.
Lines 42–54: For a particular node, we determine the next node depending on the carried-out action and figure out the possible child nodes of this node and append them to the child_nodes
array.
Lines 56–58: We check whether the current node is the last tree node.
Lines 60–68: Departing from the current node, we run a simulation and calculate the outcome based on a simulation policy.
Lines 70–77: We backpropagate results and update the visited nodes statistics. Increment by 1
the number of visits for each node visited.
Lines 79–82: We verify that all the transitions were popped out and the tree is fully expanded.
Lines 79–82: We select the best child out of the child_nodes
array.
Lines 89–91: We randomly select a move.
Lines 93–101: We iterate over the selected node siblings.
Lines 103–109: We carry out the steps of expansion, simulation, and backpropagation and return the best possible move.
Free Resources