Thompson sampling, also known as the Bayesian bandit algorithm, is a widely used technique in decision-making and reinforcement learning. This algorithm tackles the classic "multi-armed bandit" problem, where a gambler must decide which slot machine (or "arm") to pull to maximize their cumulative reward over time.
Imagine facing a row of slot machines. Each machine has a different and unknown probability of winning, representing its reward distribution. The objective is to identify the machine with the highest probability of winning and to maximize the total winnings by repeatedly pulling arms over time.
Thompson sampling takes a probabilistic approach to tackle the multi-armed bandit problem. At its core, the algorithm maintains a probability distribution for each arm, representing its underlying probability of winning. By maintaining probability distributions for each arm and continually updating them with new information, Thompson sampling efficiently explores and exploits different options, making it a powerful technique for smarter decision-making under uncertainty.
import numpy as npimport matplotlib.pyplot as pltimport pandas as pditems = pd.read_csv('Data.csv')import randomtotal_rounds = 10000ads = 10selection = []track_1 = [0] * adstrack_0 = [0] * adsfinal = 0for round in range(0, total_rounds):chosen = 0probability = 0for index in range(0, ads):score = random.betavariate(track_1[index] + 1, track_0[index] + 1)if score > probability:probability = scorechosen = indexselection.append(chosen)received = items.values[round, chosen]if received == 1:track_1[chosen] = track_1[chosen] + 1else:track_0[chosen] = track_0[chosen] + 1final = final + receivedplt.hist(selection)plt.title('Frequency of Ad selections using Thompson sampling')plt.xlabel('Ads')plt.ylabel('Number of selections')plt.savefig('output/image.png')plt.show()
Line 1–3: Importing the necessary libraries. We use numpy
for numerical computations, matplotlib.pyplot
for creating plots, and pandas
for handling data from the CSV file.
Line 5: Load the data from the CSV file Data.csv
into a pandas DataFrame called items
.
Line 8: Initializing total_rounds
to specify the number of rounds the algorithm will run.
Line 9: Initializing ads
to represent the number of arms (ads) to choose from.
Line 10: Initializing selection
to keep track of selected arms.
Line 11–12: Initializing track_1
and track_0
to store the number of times each arm received a reward or no reward.
Line 13: Initializing final
to accumulate the total rewards.
Line 14–22: Perform Thompson sampling to select arms in each round.
Line 23–28: Update reward tracking based on feedback.
Line 30–35: Plot the histogram of selected arms using Thompson sampling.
The x-axis
of the histogram represents the different arms (ads) available for selection, and the y-axis
represents the number of times each arm (ad) was selected during the total number of rounds (specified as total_rounds
in the code). The higher the count for a particular arm, the more frequently it was chosen by the Thompson sampling algorithm.
Thompson sampling has applications in various domains, including:
Online advertising: Optimizing the allocation of resources to different ad campaigns to maximize returns.
Clinical trials: Efficiently testing treatments or medications to identify the most effective.
Recommendation systems: Personalizing recommendations based on user interactions to improve engagement.
Resource allocation: Allocating resources like server capacity or inventory to maximize overall utility.
Thompson sampling solves the multi-armed bandit problem efficiently. By employing Bayesian statistics and probability distributions, Thompson sampling balances between exploring new options and exploiting well-known ones.
Free Resources