Ensemble methods in machine learning leverage the power of combining multiple models to enhance overall performance. This approach is particularly effective when individual models might have limitations or biases.
One prominent ensemble technique is stacking, a method where the predictions of diverse base models are combined through a meta-model, resulting in a more robust and accurate overall prediction. Stacking excels in scenarios where various models can offer complementary insights and contribute to improved predictive capabilities.
Here’s a step-by-step implementation of the stacking classifier using Python:
The first step is to import the required libraries.
from sklearn.ensemble import StackingClassifierfrom sklearn.model_selection import train_test_splitfrom sklearn.ensemble import RandomForestClassifier, GradientBoostingClassifierfrom sklearn.linear_model import LogisticRegressionfrom sklearn.datasets import load_irisfrom sklearn.metrics import accuracy_score
The next step is to load the dataset. We’ll use the breast cancer dataset provided by the sklearn
library.
cancer = load_breast_cancer()X_train, X_test, y_train, y_test = train_test_split(cancer.data, cancer.target, test_size=0.2, random_state=42)
The next step is to choose the base models. The stacking classifier uses multiple models stacked on top of each other. We’ll use RandomForestClassifier
and GradientBoostingClassifier
for this example.
base_models = [('rf', RandomForestClassifier(n_estimators=10, random_state=42)),('gb', GradientBoostingClassifier(n_estimators=10, random_state=42))]
After selecting the base models, we’ll choose a meta-model. We’ll use LogisticRegression
for this example.
meta_model = LogisticRegression()
We’ll now create an instance for the StackingClassifier
and fit the training data to train the model.
stacking_model = StackingClassifier(estimators=base_models, final_estimator=meta_model)stacking_model.fit(X_train, y_train)
Now, we’ll make the predictions on the test set and calculate accuracy.
y_pred = stacking_model.predict(X_test)accuracy = accuracy_score(y_test, y_pred)print("Accuracy: {:.2f}%".format(accuracy * 100))
The following code shows how we can implement the stacking
ensemble classifier in Python:
# Import necessary librariesfrom sklearn.ensemble import StackingClassifierfrom sklearn.model_selection import train_test_splitfrom sklearn.ensemble import RandomForestClassifier, GradientBoostingClassifierfrom sklearn.linear_model import LogisticRegressionfrom sklearn.datasets import load_breast_cancerfrom sklearn.metrics import accuracy_score# We will load and split the datasetcancer = load_breast_cancer()X_train, X_test, y_train, y_test = train_test_split(cancer.data, cancer.target, test_size=0.2, random_state=42)# We will define base modelsbase_models = [('gb', GradientBoostingClassifier(n_estimators=100, random_state=42)),('rf', RandomForestClassifier(n_estimators=100, random_state=42))]# We will define the meta-modelmeta_model = LogisticRegression()# We will implement Stackingstacking_model = StackingClassifier(estimators=base_models, final_estimator=meta_model)stacking_model.fit(X_train, y_train)# We will predict using the stacking modely_pred = stacking_model.predict(X_test)# We will evaluate the modelaccuracy = accuracy_score(y_test, y_pred)print("Accuracy: {:.2f}%".format(accuracy * 100))
Lines 1–7: These lines import the required libraries.
Line 10: This line loads the breast cancer dataset from sklearn
and stores it in the data
variable.
Line 11: This line splits the dataset into train and test sets.
Lines 14–17: We define RandomForestClassifier
and GradientBoostingClassifier
as the base models for the StackingClassifier.
Line 20: We create a meta-model that combines the predictions of the base models. We’ve chosen LogisticRegression
in this example.
Lines 23–24: These lines create a StackingClassifier
with specified base models and meta-model. Fit the stacking model to the training data.
Line 27: The trained model is used to make predictions on the test data.
Lines 30–31: The code calculates the accuracy of the model’s predictions by comparing them to the true labels in the test set. The accuracy is printed as a percentage.
Unlock your potential: Ensemble learning series, all in one place!
To continue your exploration of ensemble learning, check out our series of Answers below:
What is ensemble learning?
Understand the concept of combining multiple models to improve predictions.
Ensemble methods in Python: Averaging
Learn how averaging methods can boost model accuracy and stability.
Ensemble methods in Python: Bagging
Discover the power of bagging in reducing variance and enhancing prediction performance.
Ensemble methods in Python: Boosting
Dive into boosting techniques that improve weak models by focusing on mistakes.
Ensemble methods in Python: Stacking
Understand how stacking combines multiple models to make better predictions.
Ensemble methods in Python: Max voting
Explore the max voting method to combine classifier predictions and increase accuracy.
Free Resources