Credit score classification is a method for predicting the worthiness of a bank account user to request a loan. Creditworthiness is calculated using multiple metrics against a user account. Now, banks and credit card companies use machine learning algorithms to determine the creditworthiness of bank account users.
The credit score metric is used to specify the creditworthiness of the user. The score can either be good, standard, or poor. In this Answer, we’ll discuss how machine learning algorithms are used to determine the creditworthiness of users.
The dataset to determine the user’s creditworthiness is intended to evaluate the status of the loan requestor and make a decision regarding the eligibility of the loan requestor. The bank analyzes the statistics associated with the user’s account and analyzes the security of money of the account user. It is analyzed by metrics like annual income, monthly salary, loans taken, debt, and the monthly balance left at the end of the month. The machine learning model is used to train on the dataset to calculate the value of the credit score. The credit score determines whether the user is fit to have a loan. The dataset used in this model is train.csv
.
The classification process includes multiple steps, including the installations and model training to predict the score value for the loan requestor. Here is a step-by-step classification process:
To make the process work, we must install certain dependencies. In Python, we use pip3 to install the libraries. We need the scikit-learn
library to use the classification models on our dataset. The following command can install it.
pip3 install scikit-learn
We must also have our code’s two basic libraries, numpy
and pandas
.
The next step is importing the installed and data training libraries to our code.
import pandas as pdfrom sklearn.model_selection import train_test_splitimport numpy as np
In the code:
Line 1: Import the pandas
library to read the dataset for training the model.
Line 2: Import the train_test_split
model to split the dataset into two categories: testing and training.
Line 3: Import the numpy
library to use arrays in your code.
For model training, it is important to use the dataset and divide it among categories. There are two variables, x
, and y
, to define the input and output of the training model. We use multiple metrics as input to determine the output, Credit_Score
of the user.
data = pd.read_csv('train.csv')credit_score_map = {'Good':3, 'Standard':1,'Poor': 0 }data["Credit_Mix"] = data["Credit_Mix"].map(credit_score_map)x = np.array(data[["Annual_Income", "Monthly_Inhand_Salary","Num_Bank_Accounts", "Num_Credit_Card","Interest_Rate", "Num_of_Loan","Delay_from_due_date", "Num_of_Delayed_Payment","Credit_Mix", "Outstanding_Debt","Credit_History_Age", "Monthly_Balance"]])data["Credit_Score"] = data["Credit_Score"].map(credit_score_map)y = np.array(data[["Credit_Score"]])
In the code above:
Line 1: Read the dataset from the .csv
file, train.csv
.
Lines 2 and 3: Map the values of the variable Credit_Mix
to float values. It maps 0
, 1
, and 3
to the Poor
, Standard
, and Good
metric strings.
Lines 4–9: Define the input array as x
that uses variables from the dataset as a metric.
Line 10: Map the values of the variable Credit_Score
to float values. It maps 0
, 1
, and 3
to the Poor
, Standard
, and Good
metric strings.
Line 11: Create the output variable, y
, with the data from the Credit_Score
column.
Divide the dataset into the testing and training sets. The training set trains the model by giving the value of the target variable, y
, against possible x
values. Define the model HistGradientBoostingClassifier()
and use the train set to train the model. HistGradientBoostingClassifier()
works like a decision tree but uses histogram-based algorithms. It efficiently works on large datasets.
xtrain, xtest, ytrain, ytest = train_test_split(x, y,test_size=0.33,random_state=42)from sklearn.ensemble import HistGradientBoostingClassifiermodel = HistGradientBoostingClassifier()model.fit(xtrain, ytrain)
In the code:
Lines 1–3: Use the function train_test_split
to divide the dataset so that 33 percent is in the testing set to test the model and the rest 67 percent is in the training set to train the model.
Line 4: Import the classifier HistGradientBoostingClassifier()
from scikit-learn
.
Line 5: Define the HistGradientBoostingClassifier()
model.
Line 6: Use the model defined to train on the dataset.
After this, the model is trained on the dataset and is ready to calculate the target variable, y
, for sample input values.
Here, you can try testing the model for accuracy by using the testing set:
print(model.score(xtest,ytest))
If the value is 1.0
, that means the model accuracy is 100 percent. The ultimate goal is to have training set with maximum accuracy. You can test your data on custom data by defining testing variable metrics
.
metrics = np.array([[a, b, c, d, e, f, g, h, i, j, k, l]])score = model.predict(metrics)print("Predicted Credit Score = ",score)
Fill in the values with custom data in the metrics
2D array. The score has values of Good
, Standard
, and Poor
, indicating the final decision as to whether the requestor is eligible for a loan or not.
The running example of the following algorithm is shown below. Run and navigate to the working model to test your custom data.
import React from 'react'; require('./style.css'); import ReactDOM from 'react-dom'; import App from './app.js'; ReactDOM.render( <App />, document.getElementById('root') );
Free Resources