Ensemble methods in Python: stacking

Ensemble methods in machine learning leverage the power of combining multiple models to enhance overall performance. This approach is particularly effective when individual models might have limitations or biases.

One prominent ensemble technique is stacking, a method where the predictions of diverse base models are combined through a meta-model, resulting in a more robust and accurate overall prediction. Stacking excels in scenarios where various models can offer complementary insights and contribute to improved predictive capabilities.

Stacking algorithm
Stacking algorithm

Here’s a step-by-step implementation of the stacking classifier using Python:

Import the libraries

The first step is to import the required libraries.

from sklearn.ensemble import StackingClassifier
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier, GradientBoostingClassifier
from sklearn.linear_model import LogisticRegression
from sklearn.datasets import load_iris
from sklearn.metrics import accuracy_score

Load the dataset

The next step is to load the dataset. We’ll use the breast cancer dataset provided by the sklearn library.

cancer = load_breast_cancer()
X_train, X_test, y_train, y_test = train_test_split(cancer.data, cancer.target, test_size=0.2, random_state=42)

Define the base models

The next step is to choose the base models. The stacking classifier uses multiple models stacked on top of each other. We’ll use RandomForestClassifier and GradientBoostingClassifier for this example.

base_models = [
('rf', RandomForestClassifier(n_estimators=10, random_state=42)),
('gb', GradientBoostingClassifier(n_estimators=10, random_state=42))
]

Define the meta-model

After selecting the base models, we’ll choose a meta-model. We’ll use LogisticRegression for this example.

meta_model = LogisticRegression()

Implement stacking

We’ll now create an instance for the StackingClassifier and fit the training data to train the model.

stacking_model = StackingClassifier(estimators=base_models, final_estimator=meta_model)
stacking_model.fit(X_train, y_train)

Predict and evaluate

Now, we’ll make the predictions on the test set and calculate accuracy.

y_pred = stacking_model.predict(X_test)
accuracy = accuracy_score(y_test, y_pred)
print("Accuracy: {:.2f}%".format(accuracy * 100))

Example

The following code shows how we can implement the stacking ensemble classifier in Python:

# Import necessary libraries
from sklearn.ensemble import StackingClassifier
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier, GradientBoostingClassifier
from sklearn.linear_model import LogisticRegression
from sklearn.datasets import load_breast_cancer
from sklearn.metrics import accuracy_score
# We will load and split the dataset
cancer = load_breast_cancer()
X_train, X_test, y_train, y_test = train_test_split(cancer.data, cancer.target, test_size=0.2, random_state=42)
# We will define base models
base_models = [
('gb', GradientBoostingClassifier(n_estimators=100, random_state=42)),
('rf', RandomForestClassifier(n_estimators=100, random_state=42))
]
# We will define the meta-model
meta_model = LogisticRegression()
# We will implement Stacking
stacking_model = StackingClassifier(estimators=base_models, final_estimator=meta_model)
stacking_model.fit(X_train, y_train)
# We will predict using the stacking model
y_pred = stacking_model.predict(X_test)
# We will evaluate the model
accuracy = accuracy_score(y_test, y_pred)
print("Accuracy: {:.2f}%".format(accuracy * 100))

Explanation:

  • Lines 1–7: These lines import the required libraries.

  • Line 10: This line loads the breast cancer dataset from sklearn and stores it in the data variable.

  • Line 11: This line splits the dataset into train and test sets.

  • Lines 14–17: We define RandomForestClassifier and GradientBoostingClassifier as the base models for the StackingClassifier.

  • Line 20: We create a meta-model that combines the predictions of the base models. We’ve chosen LogisticRegression in this example.

  • Lines 23–24: These lines create a StackingClassifier with specified base models and meta-model. Fit the stacking model to the training data.

  • Line 27: The trained model is used to make predictions on the test data.

  • Lines 30–31: The code calculates the accuracy of the model’s predictions by comparing them to the true labels in the test set. The accuracy is printed as a percentage.

Unlock your potential: Ensemble learning series, all in one place!

To continue your exploration of ensemble learning, check out our series of Answers below:

Free Resources

Copyright ©2025 Educative, Inc. All rights reserved