The best model varies by dataset but commonly includes logistic regression, decision trees, random forests, and gradient boosting machines.
Key takeaways
Customer churn prediction helps businesses identify which customers are likely to stop using their products or services. This is done by analyzing past behavior and patterns in customer data to understand signs of potential churn.
By predicting churn, businesses can take proactive measures like targeted campaigns and personalized offers to retain at-risk customers, boosting overall customer satisfaction and profitability.
Machine learning models such as decision trees can efficiently predict churn, offering actionable insights into customer retention strategies, while model tuning can further improve prediction accuracy and business outcomes.
Customer churn prediction involves identifying individuals who may discontinue their usage of a product or service. This is achieved by analyzing past customer data to recognize patterns and behaviors indicating potential churn. By utilizing machine learning algorithms, businesses can predict which customers are at risk of churning. The objective is to implement preemptive measures, like targeted marketing campaigns and personalized offers, to retain customers and enhance satisfaction, thereby bolstering business profitability.
We'll develop a model which involves several steps.
DecisionTreeClassifier
We import the DecisionTreeClassifier
from scikit-learn
and train_test_split for data splitting, then initialize a DecisionTreeClassifier
object, and finally display the first few rows of the DataFrame df
.
from sklearn.tree import DecisionTreeClassifierfrom sklearn.model_selection import train_test_splitdectree=DecisionTreeClassifier()df.head()
We split the dataset into features (X
) and the target variable (y
), then further split the data into training and testing sets using a 70-30 split ratio. The test_size=0.3
indicates that 30% of the data will be used for testing and the remaining 70% for training. It fits the DecisionTreeClassifier
model to the training data and subsequently makes predictions on the test data, storing the predictions in the variable dectree_predict
.
X=df.drop('Exited',axis=1)y=df['Exited']X_train,X_test,y_train,y_test=train_test_split(X,y,test_size=0.3,random_state=101)dectree.fit(X_train,y_train)# Making predictionsdectree_predict=dectree.predict(X_test)
We compute and print a classification report, which includes precision, recall, F1-score, and support for each class, based on the predictions made by the decision tree model (dectree_predict
) on the test data (y_test
). Additionally, we calculate and print the accuracy
and f1_score
for the test set predictions.
from sklearn.metrics import classification_report,confusion_matrix,accuracy_score,f1_scoreprint(f" Classification report :\n {classification_report(y_test,dectree_predict)}")print("Accuracy (Test Set): %.2f" % accuracy_score(y_test, dectree_predict))print("F1-Score (Test Set): %.2f" % f1_score(y_test, dectree_predict))
We create a DataFrame matrix_df
containing the confusion matrix computed from the predictions (dectree_predict
) and the actual labels (y_test
). It then plots the confusion matrix as a heatmap using Seaborn, annotating the cell values with the actual counts. The title, x-axis label, and y-axis label are set accordingly, and the plot is displayed.
matrix_df = pd.DataFrame(confusion_matrix(y_test,dectree_predict))#plot the resultax = plt.axes()sns.set(font_scale=1.3)plt.figure(figsize=(10,7))sns.heatmap(matrix_df, annot=True, fmt="g", ax=ax, cmap="magma")#set axis titlesax.set_title('Confusion Matrix - Decision Tree')ax.set_xlabel("Predicted label", fontsize =15)ax.set_ylabel("True Label", fontsize=15)plt.show()
We initialize a new DecisionTreeClassifier
with specified hyperparameters (criterion='entropy', min_samples_split=10, min_samples_leaf=6, max_features='sqrt', random_state=1
), train it on the training data, and make predictions on the test data. Then, we print the classification report, confusion matrix, accuracy, and F1-score for the new decision tree classifier (dectreeclasfier_new
).
dectreeclasfier_new = DecisionTreeClassifier(criterion = 'entropy', min_samples_split = 10, min_samples_leaf = 6 , max_features = 'sqrt', random_state = 1)dectreeclasfier_new.fit(X_train,y_train)dectreeclasfier_predict=dectreeclasfier_new.predict(X_test)print(f" Classification report :\n {classification_report(y_test,dectreeclasfier_predict)}")print(f" Confusion Matrix :\n {confusion_matrix(y_test,dectreeclasfier_predict)}")print("Accuracy (Test Set): %.2f" % accuracy_score(y_test, dectreeclasfier_predict))print("F1-Score (Test Set): %.2f" % f1_score(y_test, dectreeclasfier_predict))
Click the "Run" button and then click the link provided under the "Run" button to open the Jupyter Notebook.
Please note that the notebook cells have been pre-configured to display the outputs for your convenience and to facilitate an understanding of the concepts covered. You are encouraged to actively engage with the material by changing the variable values.
Haven’t found what you were looking for? Contact Us
Free Resources