Thompson Sampling for Multi-Armed Bandit Problem in Python
We’re going to look at different ways that we can solve the multi-armed bandit problem in Python. Let us first understand what is a multi-armed Bandit. A one-armed bandit is a slot machine. Back in the olden days, the machine used to have a handle(lever) on the right and we had to pull the lever to make it work. A multi-armed bandit problem is kind of the challenge that a person is faced when he comes up to a whole set of these machines. Suppose you’ve got seven of these machines. You have decided to play a thousand times. How do you figure out which ones of them to play to maximize your returns?
Thompson sampling for a multi-armed bandit problem
Let us study a modern-day application of Thomson Sampling to optimize the click-through rate of an advertisement.
Outline for the task is as follows:
- Similarly, we have ten versions of the same add trying to sell a Lenovo mobile.
- Each time a user of the social network will log into his account we will place one version of these 10 ads.
- Importantly, we will observe the user’s response. If the user clicks on the ad, we get a bonus equal to 1. Else, we get a zero.
- We use Thompson sampling, a probabilistic algorithm to optimize the click-through rate of an advertisement.
Prerequisites for implementing the code:
- You must have a Spyder(Python 3.7) or any other latest version software installed.
- You need to have a dataset file, which is generally an ms-excel file, with a .csv extension.
- Set the folder as a working directory, in which your dataset is stored.
- You need to know the Python programming language.
Step by step implementation of the code:
1.Importing the libraries
import numpy as np import matplotlib.pyplot as plt import pandas as pd
2.Importing the dataset
The dataset consists of 8 columns, each corresponding to a model number. There is ‘N’ number of rows, which contain the value ‘1’ or ‘0’, for each column.
dataset = pd.read_csv('Adds_analysis.csv')
3.Implementing Thomson Sampling Algorithm in Python
First of all, we need to import a library ‘beta’. We initialize ‘m’, which is the number of models and ‘N’, which is the total number of users.
At each round, we need to consider two numbers. The first number is the number of times the ad ‘i’ got a bonus ‘1’ up to ‘ n’ rounds and the second number is the number of times ad ‘i’ got a bonus ‘0’ up to round ‘n’.
import random N = 9000 m = 8 model_selected = [] bonus_equal_to_1 = [0] * m bonus_equal_to_0 = [0] * m total_bonus = 0
- For each ad ‘i’, we take a random draw from the distribution called the beta distribution, shown below :
- This is based on Bayesian inference and Bernoulli’s trial functions.
- We select the model that has the highest fi(n) value.
- Furthermore, we are going to use a function of python, which is the random.betavariate, which will give us some random draws of the beta distribution of parameters that we choose. (here, the parameters chosen are- bonus_equal_to_1 [i]+1, bonus_equal_to_0[i]+1)
We’re taking a random draw from the distribution of parameters and we checked to see if this random draw is higher than the ‘max_count‘.
for n in range(0, N): model = 0 max_count = 0 for i in range(0, m): random_beta = random.betavariate(bonus_equal_to_1[i] + 1, bonus_equal_to_0[i] + 1) if random_beta > max_count: max_count = random_beta model = i model_selected.append(model) bonus = dataset.values[n, model]
if bonus == 1: bonus_equal_to_1[model] = bonus_equal_to_1[model] + 1 else: bonus_equal_to_0[model] = bonus_equal_to_0[model] + 1 total_bonus = total_bonus + bonus
4.Plotting a histogram
plt.hist(model_selected) plt.title('Histogram for the most liked ad') plt.xlabel('model number of ads') plt.ylabel('Number of times each ad was selected') plt.show()
Complete code:
# Importing the libraries import numpy as np import matplotlib.pyplot as plt import pandas as pd # Importing the dataset dataset = pd.read_csv('Adds_analysis.csv') # Implementing Thompson Sampling algorithm import random N = 9000 m = 8 model_selected = [] bonus_equal_to_1 = [0] * m bonus_equal_to_0 = [0] * m total_bonus = 0 for n in range(0, N): model = 0 max_count = 0 for i in range(0, m): random_beta = random.betavariate(bonus_equal_to_1[i] + 1, bonus_equal_to_0[i] + 1) if random_beta > max_count: max_count = random_beta model = i model_selected.append(model) bonus = dataset.values[n, model] if bonus == 1: bonus_equal_to_1[model] = bonus_equal_to_1[model] + 1 else: bonus_equal_to_0[model] = bonus_equal_to_0[model] + 1 total_bonus = total_bonus + bonus # Plotting a Histogram plt.hist(model_selected) plt.title('Histogram for the most liked ad') plt.xlabel('model number of ads') plt.ylabel('Number of times each ad was selected') plt.show()
Results:
As a result, the histogram shows the most preferred model. We can also check the results in the variable explorer of Spyder.
Also read: How to find the missing number in Geometric Progression in Python
Leave a Reply