XGBoost (eXtreme gradient boosting) is an open-source machine-learning library known for its ideal performance in handling structured/tabular data. Based on gradient boosting, it combines the predictions of several weak learners (usually decision trees) to create a strong predictive model.
xgb.Booster.get_dump()
functionThe xgb.Booster.get_dump()
is a function provided by XGBoost that allows us to obtain the textual representation of the underlying decision trees in the trained booster or model.
It provides transparency into the trained model to examine the individual decision trees’ details, visualize them, and gain insights into how the ensemble model makes predictions.
Note: You can learn more about plotting decision trees from the ensemble here.
The syntax for the xgb.Booster.get_dump()
function is given below:
dump_list = booster.get_dump(with_stats=False, dump_format='text')
with_stats
is an optional parameter set to False
by default. If True
, statistics about each node, such as the number of samples, will be shown in the output.
dump_format
is an optional parameter that specifies the output format. It can be text
, json
, or json_raw
.
Note: Make sure you have the XGBoost library installed. Learn more about the error-free XGBoost installation on your system here.
Let’s demonstrate the use of xgb.Booster.get_dump()
with the following code:
import xgboost as xgbimport numpy as np#Creating a synthetic datasetnp.random.seed(42)X = np.random.rand(100, 3)y = np.random.randint(0, 2, 100)#Creating an XGBoost classifiermodel = xgb.XGBClassifier()#Training the model on the datasetmodel.fit(X, y)#Getting the textual representation of the decision treesdump_list = model.get_booster().get_dump()#Printing the outputfor tree_num in range(1):print("Tree {}:\n{}".format(tree_num, dump_list[tree_num]))
Line 1–2: Firstly, we import the necessary xgb
and np
modules for this code example.
Line 5–7: Next, we create a smaller synthetic dataset with 100 samples and 3 features for our convenience using random.rand()
and random.randint()
functions. The variable y
is binary, having values 0 or 1.
Line 10: In this line, we create an XGBoost classifier with default hyperparameters and store it in the variable model
.
Line 13: Here, we train the model on the entire synthetic dataset X
and y
using the fit
method.
Line 16: Now, we use the get_booster()
method to access the underlying booster (ensemble) from the trained model and get_dump()
is called to get the textual representation of the individual decision trees.
Line 19–20: Finally, we print the textual representation of each tree in the ensemble.
Since there are 100 trees in the ensemble, for our convenience and better user experience, we will display the text for the first tree only. Upon execution, the code will give the textual representation of the first decision tree in our trained XGBoost classifier. As you can see, the tree is represented in a human-readable format and shows the split conditions and leaf values.
In conclusion, the xgb.Booster.get_dump()
function is an invaluable tool for understanding and visualizing the different decision trees in an XGBoost ensemble model. This gives us a clear perspective on the tree structures by acknowledging the model and supports troubleshooting and testing the model's predictions.
Free Resources