What is XGBoost algorithm?

XGBoost is a gradient boosting algorithm used in supervised machine learning. It belongs to the family of ensemble learning techniques, which combine the predictions of multiple models to produce a more accurate and robust final prediction. By combining the strengths of decision trees and boosting, XGBoost excels in handling structured, tabular data and provides a superior alternative to traditional methods like Random Forests.

Note: XGBoost is used for both classification and regression tasks.

Here's a step-by-step explanation of how this process works:

Step 1: Initialize the model

XGBoost first initializes the model with a single decision tree. The initial tree makes predictions based on the average target value of the training data. It typically consists of a single leaf node. The prediction of the initial tree can be represented as:

where y^0ŷ_0 is the initial prediction, yiy_i is the target value of the ithi^{th} training sample, and nn is the total number of training samples.

In case of classification problem, it takes the mode of the sample data as initial prediction.

Let's consider a sample dataset to understand the algorithm.

Exam1

Exam2

Final Exam

85

90

92

75

65

78

95

85

96

80

70

85

The initial prediction for this model is calculated as:

The initial tree will look like this:

Step 2: Make predictions

After initializing the model, we predict output for the complete sample data. Here are the predictions for our example:

Exam1

Exam2

Final Exam

Updated Predictions

85

90

92

87.75

75

65

78

87.75

95

85

96

87.75

80

70

85

87.75

Step 3: Compute residuals

Next, we compute the residuals of sample data. Residuals are taken as the differences between the actual target values and the initial predictions. The residual for each data point i is given by:

where rir_i is the residual for the ithi^{th} data point, yiy_i is the actual target value, and y^0ŷ_0 is the initial prediction.

Here are the residuals for our dataset:

Exam1

Exam2

Final exam

Predicted final exam

Residuals

85

90

92

87.75

4.25

75

65

78

87.75

-9.75

95

85

96

87.75

8.25

80

70

85

87.75

-2.75

Step 4: Fit a new tree

XGBoost fits a new decision tree to predict the error of the initial tree using the residuals from the previous step. To fit a new tree, we will traverse all the data rows through the tree and write the residuals on the leaf nodes.

Now, for all the leaf nodes with more than one entry, take the average:

The tree will be updated as follows:

Step 5: Update predictions

The new decision tree predictions are combined with the predictions of the previous model, and the model becomes more accurate. The updated prediction is the sum of the previous prediction and a fraction of the new prediction, gained from the new decision tree:

where y^inewŷ_{i_\text{new}} is the updated prediction, y^ioldŷ_{i_\text{old}} is the prediction from the previous model, γγ is the learning rate, and T(xi)T(x_i) is the prediction of the model for the input xix_i.

To control the influence of each new tree, XGBoost introduces a learning rate. The learning rate scales the contribution of each tree, preventing the model from overfitting too quickly.

Let's take 0.3 as the learning rate for our example. The updated model will be as follows:

For our set of sample data, the updated predictions will become:

Exam1

Exam2

Final exam

Updated predictions

85

90

92

87.75+0.3(6.25) = 89.625

75

65

78

87.75+0.3(-6.25) = 85.875

95

85

96

87.75+0.3(6.25) = 89.625

80

70

85

87.75+0.3(-6.25) = 85.875

Note that updated predictions are better than initial predictions.

Step 6: Determine terminal conditions

The boosting process continues iteratively repeating from steps 3 to 5, until certain terminal conditions are met. These conditions can be the number of boosting rounds (maximum iterations) or a specific threshold for performance improvement.

On each iteration, a tree will be added to the model. Each successive tree tries to predict and correct the error of the model before that tree. Here is the overview of what the model will look like after termination:

Features of XGBoost

  1. Ensemble learning: XGBoost combines the predictions of multiple weak learners (decision trees) to create a strong predictive model.

  2. Tree pruning: XGBoost applies pruning techniques to limit the depth of trees, which reduces overfitting and improves generalization.

  3. Handling missing values: XGBoost can handle missing values during tree construction without the need for imputation.

  4. Feature importance and selection: XGBoost automatically handles feature selection and can capture feature interactions, making it effective in handling high-dimensional datasets.

  5. Parallel and distributed computing: XGBoost is optimized for parallel and distributed computing, making it efficient for large-scale datasets.

Conclusion

XGBoost is a dominant and popular machine learning algorithm due to its exceptional performance, scalability, and interpretability. It efficiently handles structured data, provides valuable feature importance insights, and excels at complex predictive modeling tasks, pushing the boundaries of AI applications. Understanding the power of XGBoost can give us a competitive edge in solving real-world problems and making more informed decisions based on data-driven insights.

Free Resources

Copyright ©2025 Educative, Inc. All rights reserved