Gradient boosting is a technique used when building machine learning models. It is commonly called an ensemble model because it combines decision trees to build a more robust and effective algorithm. This is where the term booster comes in. For classification models, the GradientBoostingClassifier
is used, while the GradientBoostingRegressor
is used for regression models. Both can be imported from the scikit-learn library.
Given that we've created a dataset that has been split into X
and y
variables, we can implement the gradient boosting regression as shown below:
import pandas as pdimport numpy as npfrom sklearn.ensemble import GradientBoostingRegressorfrom sklearn.metrics import mean_absolute_errorfrom sklearn.metrics import r2_scorefrom sklearn.model_selection import train_test_split#creating a list of values for years_experience & salaryyears_experience = [1.1, 1.3, 1.5, 2.0, 2.2, 2.9, 3.0,3.2,3.2,3.7,3.9,4.0,4.0,4.1,4.5,4.9,5.1,5.3,5.9,6.0,6.8,7.1,7.9,8.2,8.7,9.0,9.5,9.6,10.3,10.5]salary = [39343.00, 46205.00, 37731.00, 43525.00, 39891.00, 56642.00, 60150.00, 54445.00, 64445.00, 57189.00, 63218.00, 55794.00, 56957.00, 57081.00,61111.00,67938.00,66029.00,83088.00,81363.00,93940.00,91738.00,98273.00,101302.00,113812.00,109431.00,105582.00,116969.00,112635.00,122391.00, 121872.00 ]# Create a dataframe from listsdf = pd.DataFrame({'years_experience': years_experience, 'salary': salary})# Split the data into training and testing setsX = df[['years_experience']]y = df['salary']X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)# Fit a GradientBoostingRegressor modelmodel = GradientBoostingRegressor(n_estimators=100, learning_rate=0.1, max_depth=3, random_state=42)model.fit(X_train, y_train)# Make a prediction on the test datay_pred = model.predict(X_test)# Print the R-squared valuer2 = model.score(X_test, y_test)print("Mean_absoloute_score is: ", mean_absolute_error(y_pred, y_test))print("R_squared score is: ",r2_score(y_pred, y_test))
The code above demonstrates how to implement gradient boosting using the sckit-learn library:
Lines 1–6: We import the necessary libraries.
Lines 10–11: We assign a list of values to variables, years_experience
and salary
.
Line 14: We create a DataFrame from the lists created.
Line 17–18: We split the dataset into the independent, X
and dependent, y
variables.
Line 19: We split the X
and y
variables into train and test sizes. The test size chosen is 0.3 with a random state set to 41. The train and test sizes for the independent variable, X
is reshaped as we are working with a single column.
Line 22: We create an instance of GradientBoostingRegressor
.
Line 23: Training of the model.
Line 26: We make predictions on the test data using the gbr.predict()
command.
Lines 29–31: We measure the r2_score
and mean_absolute_error
of our model and print the outputs to the console.
We implement GradientBoostingClassifier
in the same way as GradientBoostingRegressor
in the steps outlined in the code above.
Free Resources