AdaBoost is a classification algorithm in machine learning, which is an advanced version of decision trees. In AdaBoost, the tree is split into a single level called a decision stump. We train our model and try to classify the dataset based on the stump tree.
In AdaBoost, we assign equal weight to each data point in the data set. Then points that are wrongly classified are assigned a higher weight, so these points must be given importance in the next model. By this, we will keep training the models until there are no misclassifications by the models. Have a look at the flow diagram of the AdaBoost.
This is how one model's weakness (miss classification) is transferred to other models, so the next model should give importance to the miss classified points.
Now let us discuss the algorithm in detail with an example.
Before diving into the depth of the working of the algorithm, let us consider the following dataset.
No. | Gender | Age | Income (In dollars) | Sickness |
1 | Male | 41 | 400 | 1 |
2 | Male | 54 | 300 | 0 |
3 | Female | 42 | 250 | 0 |
4 | Female | 40 | 600 | 1 |
5 | Male | 46 | 500 | 1 |
We have selected the small data set because it will be easy to understand the working of the algorithm. The points of the algorithm are as follow.
We will assign equal weights to each data point in the first step. The formula to assign the sample weight is as follows:
Here the
We need to find the criteria for classification. For this, we need to find the
Let us say that the Gini index for gender is small compared to age and income. So our first stump will be gender.
We can calculate the importance of the classifier with the help of the following equation:
Let us consider that model miss classified one data point. By this error is
The value of
It is necessary to calculate the value of
Weights in the AdaBoost are updated with the help of the equation given below.
The value of
By putting the value of
The weights for the miss classified are given as:
No. | Gender | Age | Income (In dollars) | Sickness | Previous sample weight | Updated sample weight |
1 | Male | 41 | 400 | 1 | 1/5 | 0.1004 |
2 | Male | 54 | 300 | 0 | 1/5 | 0.1004 |
3 | Female | 42 | 250 | 0 | 1/5 | 0.1004 |
4 | Female | 40 | 600 | 1 | 1/5 | 0.3988 |
5 | Male | 46 | 500 | 1 | 1/5 | 0.1004 |
We need to normalize the sample weights as their sum must be 1.
No. | Gender | Age | Income (In dollars) | Sickness | Previous sample weight | Updated sample weight |
1 | Male | 41 | 400 | 1 | 1/5 | 0.1254 |
2 | Male | 54 | 300 | 0 | 1/5 | 0.1254 |
3 | Female | 42 | 250 | 0 | 1/5 | 0.1254 |
4 | Female | 40 | 600 | 1 | 1/5 | 0.4982 |
5 | Male | 46 | 500 | 1 | 1/5 | 0.1254 |
We will pass this information to the next model; the next model will train based on updated sample weight.
We will keep iterating the steps until there is no miss classified point based on the Gini index. As shown below, we will stop when our model can classify the points correctly.
No. | Gender | Age | Income (In dollars) | Sickness | Previous sample weight | Updated sample weight |
1 | Male | 41 | 400 | 1 | 1/5 | 0.1254 |
2 | Male | 54 | 300 | 0 | 1/5 | 0.1254 |
3 | Female | 42 | 250 | 0 | 1/5 | 0.1254 |
4 | Female | 40 | 600 | 1 | 1/5 | 0.4982 |
5 | Male | 46 | 500 | 1 | 1/5 | 0.1254 |
AdaBoost is a powerful model for binary classification. It reduces the chances of underfitting and overfitting. If you need reinforcement about your concepts of AdaBoost, then revise the concept of decision trees and random forests.
How does AdaBoost assign weights to the training instances?
It assigns equal weights to all instances.
It assigns higher weights to instances that are misclassified.
It assigns higher weights to instances that are correctly classified.
It assigns weights based on the feature importance.
Free Resources