How to compute correlation using pandas

pandas is a popular Python-based data analysis toolkit that can be imported using:

import pandas as pd

It presents a diverse range of utilities from parsing multiple file-formats to converting an entire data table into a NumPy matrix array. This property makes pandas a trusted ally in data science and machine learning.

pandas can help in the creation of multiple types of data analysis graphs. One such tool is correlation.

The default implementation of the correlation table is:

DataFrame.corr(methods = Pearson min_periods= 1)

Parameters

method: {‘pearson’, ‘kendall’, ‘spearman’} or callable - Mathods of correlation:
-pearson: Standard correlation coefficient
-kendall: Kendall Taus correlation coefficient
-spearman: Spearman rank correlation
-callable: Any callable that takes two 1d ndarrays as an input and returns a float.“
min_period: int - The minimum number of observations required per pair of columns to have a valid result. This is only for Pearson and Spearman.

Code

The following code shows how correlation can be computed in Python – you can change different parameters and look at how the output varies.

It shows the correlation between dogs and cats using the default settings.

New on Educative

Learn to Code

Learn any Language as a beginner

Develop a human edge in an AI powered world and learn to code with AI from our beginner friendly catalog

🏆 Leaderboard

Daily Coding Challenge

Solve a new coding challenge every day and climb the leaderboard

Free Resources