Seaborn cheatsheet

Seaborn is a Python data visualization library built on top of Matplotlib, offering a high-level interface for creating visually appealing statistical graphics. It seamlessly integrates with pandas DataFrames, making it easy to visualize datasets loaded into pandas. Seaborn excels at producing various statistical visualizations such as scatter plots, line plots, bar plots, histograms, kernel density estimates, violin plots, and box plots. These plots often include built-in statistical estimation and aggregation functions, allowing users to quickly analyze and visualize data distributions.

Importing Seaborn

To import Seaborn, we have to have Seaborn and Matplotlib installed in our environment. You can install the libraries using the following commands:

!pip install matplotlib
!pip install seaborn

After successful installation, we can import the libraries like so:

import matplotlib.pyplot as plt
import seaborn as sns

Preparation of data

After importing the libraries, let's see different ways we can use data for visualization:

Creating our dataset

We can create our own dataset using NumPy and pandas. First, we can create our data point using numpy arrays, and then we can convert those arrays into a DataFrame using pandas.

import numpy as np
import pandas as pd
col1 = np.random.rand(15)
data = pd.DataFrame({
'x':col1,
'y':np.random.normal(0.1,5,15)
})
print(data)

Load predefined datasets

There are seventeen famous datasets that are built into the Seaborn library. To see the built-in dataset, we can check using the following code:

import seaborn as sns
print(sns.get_dataset_names())

Let's choose the iris dataset. We can load the dataset by running the following code:

import seaborn as sns
data = sns.load_dataset("iris")
print(data.head())

Basic plots

Seaborn provides us with built-in functions to visualize our data using basic plots.

Scatter plot

A scatter plot visually represents the relationship between two variables by plotting points on a Cartesian plane. It's useful for identifying patterns or correlations in data.

# Scatter Plot
plt.figure(figsize=(8, 6))
sns.scatterplot(x='sepal_length', y='sepal_width', hue='species', data=data)
plt.title('Scatter Plot of Sepal Length vs. Sepal Width')
plt.show()

Line plot

A line plot connects data points with straight lines, typically used to display data over time or sequential data points. It helps in visualizing trends or patterns in data.

# Line Plot
plt.figure(figsize=(8, 6))
sns.lineplot(x=data.index, y='sepal_length', data=data)
plt.title('Line Plot of Sepal Length')
plt.xlabel('Index')
plt.ylabel('Sepal Length (cm)')
plt.show()

Bar plot

A bar plot displays data using rectangular bars with lengths proportional to the values they represent. It's effective for comparing categories of data or showing distributions.

# Bar Plot
plt.figure(figsize=(8, 6))
sns.barplot(x='species', y='sepal_length', data=data)
plt.title('Bar Plot of Sepal Length by Species')
plt.xlabel('Species')
plt.ylabel('Sepal Length (cm)')
plt.show()

Histogram

A histogram displays the distribution of numerical data by dividing it into bins and showing the frequency of each bin. It's useful for understanding the underlying distribution of a dataset.

# Histogram
plt.figure(figsize=(8, 6))
sns.histplot(data['sepal_length'], bins=20)
plt.title('Histogram of Sepal Length')
plt.xlabel('Sepal Length (cm)')
plt.ylabel('Frequency')
plt.show()

Box plot

A box plot visualizes the distribution of numerical data through quartiles, providing insights into the central tendency, variability, and outliers of the dataset.

# Box Plot
plt.figure(figsize=(8, 6))
sns.boxplot(x='species', y='sepal_length', data=data)
plt.title('Box Plot of Sepal Length by Species')
plt.xlabel('Species')
plt.ylabel('Sepal Length (cm)')
plt.show()

Count plot

A count plot represents the frequency of unique values in a dataset, often used for categorical data to show the distribution of different categories.

# Count Plot
plt.figure(figsize=(8, 6))
sns.countplot(x='sepal_length', data=data)
plt.title('Count Plot of Sepal length')
plt.xlabel('Sepal length')
plt.ylabel('Count')
plt.show()

Point plot

A point plot displays point estimates and confidence intervals to represent the relationship between two variables. It's helpful for comparing groups or conditions in an experiment or study.

# Point Plot
plt.figure(figsize=(8, 6))
sns.pointplot(x='species', y='sepal_length', data=data)
plt.title('Point Plot of Sepal Length by Species')
plt.xlabel('Species')
plt.ylabel('Sepal Length (cm)')
plt.show()

Advanced plots

Along with basic plots, we can also create advanced plots using Seaborn. These advanced plots provide a deeper insight into our data and highlight a deeper relationship between our different variables.

Pair plot

A pair plot displays pairwise relationships between different variables in a dataset. It shows scatterplots for each pair of variables and histograms for each variable along the diagonal, making it easy to visualize correlations and distributions within the dataset.

# Pairplot
sns.pairplot(data)
plt.title("Pairplot of Iris Dataset")
plt.show()

Joint plot

A joint plot displays the relationship between two variables along with their individual distributions. It typically includes a scatterplot with marginal histograms or kernel density estimates, providing insights into the correlation and the distribution of the variables.

# Jointplot
sns.jointplot(x='petal_length', y='petal_width', data=data, kind='scatter')
plt.show()

Violin plot

A violin plot displays the distribution of a numeric variable for different categories or groups. It combines a box plot with a kernel density estimate, showing the distribution of the data as well as its summary statistics such as median, quartiles, and outliers.

# Violinplot
sns.violinplot(x='species', y='sepal_length', data=data)
plt.title("Violinplot of Sepal Length by Species")
plt.show()

Kernel Density Estimate (KDE) plot

A KDE plot visualizes the probability density function of a continuous variable. It provides a smoothed representation of the distribution of the data, making it easier to identify peaks, valleys, and the overall shape of the distribution.

# KDEplot
sns.kdeplot(data['sepal_length'], shade=True)
plt.title("KDEplot of Sepal Length")
plt.show()

Heatmap

A heatmap visualizes the correlation matrix of a dataset using colors. It is particularly useful for identifying patterns and relationships between variables, with brighter colors indicating stronger correlations.

# Heatmap
data = data.drop(columns=['species'])
confusion_matrix = data.corr()
sns.heatmap(confusion_matrix, annot=True, cmap='coolwarm')
plt.title("Heatmap of Correlation Matrix")
plt.show()

FacetGrid

A FacetGrid divides a dataset into subsets based on one or more categorical variables and creates a separate plot for each subset. It allows for comparison between different groups or categories within the dataset.

# FacetGrid
g = sns.FacetGrid(data, col='species')
g.map(sns.scatterplot, 'petal_length', 'petal_width')
plt.show()

Regplot

A regplot (regression plot) displays the relationship between two variables and fits a linear regression model to the data. It provides insights into the strength and direction of the relationship, along with the uncertainty associated with the regression line.

# Regplot
sns.regplot(x='sepal_length', y='sepal_width', data=data)
plt.title("Regplot of Sepal Length vs. Sepal Width")
plt.show()

Categorical plot

A categorical plot visualizes the distribution of a numeric variable across different categories or groups. It can take various forms, such as box plots, violin plots, or bar plots, providing insights into how the distribution varies between different categories.

# Categorical Plot
sns.catplot(x='species', y='petal_length', data=data, kind='box')
plt.show()

Free Resources

Copyright ©2025 Educative, Inc. All rights reserved