Ensemble methods in machine learning combine the strengths of multiple models for enhanced performance. Averaging, a key technique, consolidates predictions through a weighted average. It combines predictions from multiple models by calculating the average of their outputs, often improving overall performance. It helps reduce overfitting and enhances model robustness. This method is simple yet effective, providing a straightforward way to leverage diverse insights from individual models.
Let’s look at the steps required to implement the averaging algorithm in Python.
The first step is to import the required libraries.
from sklearn.ensemble import VotingClassifierfrom sklearn.model_selection import train_test_splitfrom sklearn.ensemble import RandomForestClassifier, GradientBoostingClassifierfrom sklearn.datasets import load_breast_cancerfrom sklearn.metrics import accuracy_score
The next step is to load the dataset. We’ll use the breast cancer dataset provided by the sklearn
library. This dataset consists of 30 features. The target variable is the diagnosis where 1
represents malignant and 0
represents benign tumors. The train_test_split function divides the dataset into training and testing data.
cancer = load_breast_cancer()X_train, X_test, y_train, y_test = train_test_split(cancer.data, cancer.target, test_size=0.2, random_state=42)
The next step is to choose the base models. Averaging classifier uses multiple models to calculate the weighted average. We’ll use the random forest classifier and gradient boosting classifier for this example.
rf_model = RandomForestClassifier(n_estimators=10, random_state=42)gb_model = GradientBoostingClassifier(n_estimators=10, random_state=42)
We’ll now create an instance for the VotingClassifier
and fit the training data to train the model. We’ll set the voting method to soft
to calculate the weighted average. We create a VotingClassifier
instance to aggregate predictions from multiple models. Soft voting calculates a weighted average of probability estimates.
averaging_model = VotingClassifier(estimators=[('rf', rf_model), ('gb', gb_model)], voting='soft')averaging_model.fit(X_train, y_train)
Now, we’ll make the predictions on the test set and calculate accuracy.
y_pred = averaging_model.predict(X_test)accuracy = accuracy_score(y_test, y_pred)print("Accuracy: {:.2f}%".format(accuracy * 100))
The following code shows how we can implement the averaging
ensemble classifier in Python:
from sklearn.ensemble import VotingClassifierfrom sklearn.model_selection import train_test_splitfrom sklearn.ensemble import RandomForestClassifier, GradientBoostingClassifierfrom sklearn.datasets import load_breast_cancerfrom sklearn.metrics import accuracy_score# Load and split the datasetcancer = load_breast_cancer()X_train, X_test, y_train, y_test = train_test_split(cancer.data, cancer.target, test_size=0.2, random_state=42)# Define base modelsrf_model = RandomForestClassifier(n_estimators=10, random_state=42)gb_model = GradientBoostingClassifier(n_estimators=10, random_state=42)# Create an ensemble using averagingaveraging_model = VotingClassifier(estimators=[('rf', rf_model), ('gb', gb_model)], voting='soft')averaging_model.fit(X_train, y_train)# Predict and evaluatey_pred = averaging_model.predict(X_test)accuracy = accuracy_score(y_test, y_pred)print("Accuracy: {:.2f}%".format(accuracy * 100))
Lines 1–5: We import the required libraries.
Line 8: We load the breast cancer dataset from sklearn
and stores it in the cancer
variable.
Line 9: This line splits the dataset into train and test.
Lines 12–13: We define RandomForestClassifier
and GradientBoostingClassifier
as the base models for the VotingClassifier.
Lines 16–17: We create a VotingClassifier
with specified base models. We use the soft-voting method to calculate the weighted average.
Line 20: The trained model is used to make predictions on the test data.
Lines 21–22: The code calculates the accuracy of the model’s predictions by comparing them to the true labels in the test set. The accuracy is printed as a percentage.
Unlock your potential: Ensemble learning series, all in one place!
To continue your exploration of ensemble learning, check out our series of Answers below:
What is ensemble learning?
Understand the concept of combining multiple models to improve predictions.
Ensemble methods in Python: Averaging
Learn how averaging methods can boost model accuracy and stability.
Ensemble methods in Python: Bagging
Discover the power of bagging in reducing variance and enhancing prediction performance.
Ensemble methods in Python: Boosting
Dive into boosting techniques that improve weak models by focusing on mistakes.
Ensemble methods in Python: Stacking
Understand how stacking combines multiple models to make better predictions.
Ensemble methods in Python: Max voting
Explore the max voting method to combine classifier predictions and increase accuracy.
Free Resources