In this answer, we'll look at how to implement a logistic regression model for binary classification tasks.
Binary classification is a branch of supervised machine learning that categorizes examples into one of two classes or categories that are mutually exclusive. The two classes are frequently referred to as the positive class, denoted by the number 1, and the negative class, denoted by the number 0.
Some real-life binary classification problems include:
Email spam detection: Classifying emails as spam or non-spam based on their content and attributes.
Disease diagnosis: Classifying patients as having a specific disease or not based on their symptoms, medical test results, and medical history.
Credit card fraud detection: Recognizing bogus credit card transactions from a large volume of transactions based on various features and patterns.
Sentiment analysis: Classifying text or social media posts as positive or negative sentiments to analyze customer opinions, feedback, or sentiment trends.
Churn prediction: Predicting whether customers will churn (cancel their subscription or leave a service) based on their behaviour, usage patterns, and demographic information.
Image classification: Distinguishing between different objects or categories in images, such as classifying images of cats and dogs.
A statistical model called a logistic regression model is employed for binary classification tasks. Contrary to what its name implies, logistic regression is a classification algorithm. It is based on the idea of regression and is known as “logistic regression” because it uses a logistic (sigmoid) function to translate the output to a probability value between 0 and 1.
The logistic regression model can be represented mathematically as:
where
The following steps show how to implement the logistic regression model.
Here, we'll import the LogisticRegression
model from linear_model
module in the scikit-learn
library, numpy
module, pyplot
from the matplotlib
library, and the train_test_split()
method from the model_selection
module in scikit-learn
library.
from sklearn.linear_model import LogisticRegressionimport numpy as npimport matplotlib.pyplot as pltfrom sklearn.model_selection import train_test_split
In this case, we'll be making use of synthetic data containing 400 samples in generating the input features and labels for our model. It is important to make sure that your data is binary labelled data.
# creating random datanp.random.seed(0)num_samples = 400X = np.random.randn(num_samples, 2) # Random input featuresy = np.random.randint(0, 2, num_samples) # Random binary labelsprint("Input features")print(X)print("Binary labels")print(y)
Using the train_test_split()
method we can now divide our data into training and testing datasets, where 80% of the data will be used for training while the rest of it will be used for testing. For the splitting, we set the randomness to 42 by using the random_state
argument.
# splitting the dataX_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)print("Training features shape")print(X_train.shape)print("Testing features shape")print(X_test.shape)print("Training labels shape")print(y_train.shape)print("Testing labels shape")print(y_test.shape)
We'll call the instance of the model, LogisticRegression()
, and train it using our training and testing data.
# Define and train the logistic regression modelmodel = LogisticRegression()model.fit(X_train, y_train)
We can obtain its accuracy on the test data using the .score()
attribute of the logistic regression model.
# Evaluate the modelaccuracy = model.score(X_test, y_test)print(f"Accuracy of the Logistic Regression Model: {round(accuracy*100)}%")
To implement the regression model in a binary classification problem, you will take the following steps:
Import dependencies as well as the model.
Load your data containing samples to generate the input features and binary labels.
Divide the data into testing and training samples, keeping a larger percentage of the data for testing.
Define and train your model.
Evaluate your model.
Free Resources