Bagging vs. boosting

Bagging and boosting are popular ensemble techniques in Machine Learning. Since both techniques combine results from various models, they are used to produce models with higher stability.

Ensemble techniques refers to the approach of combining several Machine Learning models to produce a model with a better predictive performance than a single model.

Bagging

The bagging technique is used to reduce the variance of decision tree classifiers. The method involves creating several subsets of the dataset randomly with replacement. Each subset of data can then used to train a decision tree, resulting in an ensemble of different decision trees. The prediction from all these models is averaged to give a more robust value than a single decision tree.

Since various models in different subsets of the same data are used to make predictions, bagging helps to decrease the model’s variance.

Bagging creates subsets of data which are then used to train separate models.

Boosting

The boosting ensemble technique iteratively adjusts the weights of observation at each classification. Each classifier is trained on a subset of data. The misclassified outputs from the mispredicted data points are given a greater weight and fed into the second tree. The succeeding models depend on the previous model. If a tree misclassifies an input, its weight is increased so that the next tree is more likely to classify the input correctly. By combining these trees, weaker decision trees are iteratively converted into better-performing models.

Differences between the two methods

The main purpose of both these techniques is to create a model with a higher stability than a single tree model.

In places where a single tree gives a poor performance, bagging will rarely result in a better prediction since the method uses a subset of the same data. In such situations, boosting can be used to iteratively combine models with lower errors to produce a high-performing model.

In situations where a single model overfits, bagging can be used to produce multiple models from the subsets of data. Since the resulting model averages result from multiple models, bagging can be used in such situations.

Free Resources