A **correlation matrix** is used to show the degree of the linear relationship between variables in a dataset. It indicates the correlation using the correlation coefficient. 

The **correlation coefficient** shows how strongly or weakly any two variables are related. Scores range between 1 and -1. 1 indicates *a perfect positive correlation*, whereas -1 indicates *a perfect negative correlation*. Scores closer to 0 indicate a weak correlation. 

# Understanding correlation coefficient

**Correlation** refers to a degree of relationship between variables. It can be *causal* or *non-causal*. We say that there is a positive correlation when an increase in variable $x$ causes an increase in variable $y$. We say that there is a negative correlation when an increase in variable $x$ causes a decrease in variable $y$. 

The illustration below shows positive and negative correlations:

The table below summarizes correlation coefficients:

|  Coefficient | Meaning  |
| - | - |
| 1  | Perfect positive correlation. A unit increase in variable $x$ means a unit increase in variable $y$.  |
| -1  | Perfect negative correlation. A unit increase in variable $x$ means a unit decrease in variable $y$.  |
| 0  | No correlation. Variables are not related.  |

A correlation matrix displays the correlation between all numerical variables present in the dataset. If a dataset has $n$ numerical features, a correlation matrix may have $n^2$ values that are symmetric about the center. Therefore, it is sufficient to analyze only the top or bottom half of the matrix.

The illustration below shows a visual representation of a correlation matrix:

> The diagonal always has a coefficient of 1.00, since it represents a relation between the variable with itself.

> A gradient color scheme helps to improve understanding of the coefficient scores.

# Example

The code snippet below shows how we can create a correlation matrix in Python:

import pandas as pd # for creating a dataframe
import seaborn as sn # for shaping our matrix
import matplotlib.pyplot as plt # for creating visualizations

# Data for matrix
data = {'A': [45,37,42,35,39],
        'B': [38,31,26,28,33],
        'C': [10,15,17,21,12]
        }

df = pd.DataFrame(data,columns=['A','B','C'])
print("Original Matrix")
print(df) # original matrix

print("\n")
corrMatrix = df.corr() # finding correlations
print("Correlation Coefficients Matrix")
print (corrMatrix) # printing correlations

python

# Visual Representation of Correlation Matrix
sn.heatmap(corrMatrix, annot = True, cmap = 'Blues')


> `Line 11` creates a dataframe. A *dataframe* can be referred to as a *matrix*.

> `Line 16` uses the `corr` function on our dataframe to calculate the correlation coefficients matrix.

The second code snippet is a continuation of the first code snippet.

 It creates a visualization of the correlation matrix using Seaborn and Matplotlib. It takes in the correlation coefficients, annotates them, and colors them blue. 

python_sns_plt.tar.gz

What is a correlation matrix?

A correlation matrix shows the degree of linear relationship between dataset variables using coefficients ranging from -1 to 1.

Coefficient	Meaning
1	Perfect positive correlation. A unit increase in variable $x$ means a unit increase in variable $y$ .
-1	Perfect negative correlation. A unit increase in variable $x$ means a unit decrease in variable $y$ .
0	No correlation. Variables are not related.