Implementing logistic regression model for binary classification

In this answer, we'll look at how to implement a logistic regression model for binary classification tasks.

What is binary classification?

Binary classification is a branch of supervised machine learning that categorizes examples into one of two classes or categories that are mutually exclusive. The two classes are frequently referred to as the positive class, denoted by the number 1, and the negative class, denoted by the number 0.

Some real-life binary classification problems include:

  • Email spam detection: Classifying emails as spam or non-spam based on their content and attributes.

  • Disease diagnosis: Classifying patients as having a specific disease or not based on their symptoms, medical test results, and medical history.

  • Credit card fraud detection: Recognizing bogus credit card transactions from a large volume of transactions based on various features and patterns.

  • Sentiment analysis: Classifying text or social media posts as positive or negative sentiments to analyze customer opinions, feedback, or sentiment trends.

  • Churn prediction: Predicting whether customers will churn (cancel their subscription or leave a service) based on their behaviour, usage patterns, and demographic information.

  • Image classification: Distinguishing between different objects or categories in images, such as classifying images of cats and dogs.

What is a logistic regression model?

A statistical model called a logistic regression model is employed for binary classification tasks. Contrary to what its name implies, logistic regression is a classification algorithm. It is based on the idea of regression and is known as “logistic regression” because it uses a logistic (sigmoid) function to translate the output to a probability value between 0 and 1.

The logistic regression model can be represented mathematically as:

      p(y=1X)=1(1+exp(z))p(y=1|X) =\frac{1}{(1 + exp(-z))},

where

  • p(y=1X)=p(y=1|X)= the likelihood of the favourable class in light of the input features XX,

  • exp()=exp()= the exponential function,

  • z=z= the linear combination of input features and their corresponding coefficients.

Implementing a logistic regression model

The following steps show how to implement the logistic regression model.

Step 1: Import the necessary dependencies for your binary classification task.

Here, we'll import the LogisticRegression model from linear_model module in the scikit-learn library, numpy module, pyplot from the matplotlib library, and the train_test_split() method from the model_selection module in scikit-learn library.

from sklearn.linear_model import LogisticRegression
import numpy as np
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split

Step 2: Load the data

In this case, we'll be making use of synthetic data containing 400 samples in generating the input features and labels for our model. It is important to make sure that your data is binary labelled data.

# creating random data
np.random.seed(0)
num_samples = 400
X = np.random.randn(num_samples, 2) # Random input features
y = np.random.randint(0, 2, num_samples) # Random binary labels
print("Input features")
print(X)
print("Binary labels")
print(y)

Step 3: Split the data

Using the train_test_split() method we can now divide our data into training and testing datasets, where 80% of the data will be used for training while the rest of it will be used for testing. For the splitting, we set the randomness to 42 by using the random_state argument.

# splitting the data
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
print("Training features shape")
print(X_train.shape)
print("Testing features shape")
print(X_test.shape)
print("Training labels shape")
print(y_train.shape)
print("Testing labels shape")
print(y_test.shape)

Step 4: Call an instance of the model and train it

We'll call the instance of the model, LogisticRegression(), and train it using our training and testing data.

# Define and train the logistic regression model
model = LogisticRegression()
model.fit(X_train, y_train)

Step 5: Evaluating the logistic regression model

We can obtain its accuracy on the test data using the .score() attribute of the logistic regression model.

# Evaluate the model
accuracy = model.score(X_test, y_test)
print(f"Accuracy of the Logistic Regression Model: {round(accuracy*100)}%")

Summary

To implement the regression model in a binary classification problem, you will take the following steps:

  • Import dependencies as well as the model.

  • Load your data containing samples to generate the input features and binary labels.

  • Divide the data into testing and training samples, keeping a larger percentage of the data for testing.

  • Define and train your model.

  • Evaluate your model.

Free Resources

Copyright ©2025 Educative, Inc. All rights reserved