Data visualization with Matplotlib

Key takeaways:

  • Matplotlib supports a wide range of plot types, from basic charts to advanced 3D and animated plots.

  • Matplotlib offers extensive options to tailor plots and integrates well with libraries like pandas and NumPy.

  • Matplotlib is beginner-friendly with a simple interface with advanced features for professional-grade visualizations.

Matplotlib is a versatile Python library that empowers data scientists and analysts to create various visualizations. Matplotlib gives us the means to visualize data in a variety of ways, from straightforward line plots to complex 3D representations. Users can tailor plots to specific needs by leveraging its extensive customization options, enhancing data exploration and insight extraction.

Installing Matplotlib

Use the pip command to install this library:

 pip install matplotlib

Importing pyplot from matplotlib

The Matplotlib library contains the pyplot module, which offers a MATLAB-like interface for making visualizations. It offers a stateful approach, meaning that each function call modifies the current figure or axes. This makes it easy to create quick and simple plots without needing to explicitly create figure and axes objects.

import matplotlib.pyplot as plt

Why use Matplotlib?

  • Comprehensive visualization tools: Matplotlib covers a wide variety of plot types and supports advanced features like subplots, annotations, and 3D visualizations.

  • Highly customizable: Create professional, publication-ready graphs by tweaking fonts, colors, line styles, and more.

  • Seamless integration: Works seamlessly with other libraries like NumPy, pandas, and seaborn for extended functionality.

  • Open source and widely supported: Free to use, with active community support and extensive documentation.https://how.dev/answers/how-to-create-a-line-chart-using-d3

Plotting with Matplotlib

We can create a whole variety of plots using Matplotlib, with some examples listed below:

  • Line charts: Best for visualizing trends over time or other continuous data.
  • Bar charts: Ideal for comparing categories or groups.
  • Histograms: Represent the frequency distribution of numerical data.
  • Scatter plots: Highlight relationships or correlations between two variables.
  • Pie charts: Show proportions of a whole.
  • Subplots: Enable multiple plots in a single figure for side-by-side comparisons.

Basics of plotting using Matplotlib

A plot contains a few important elements that you can add using this library:

  1. Adding a title: Sets the main title of the plot.

matplotlib.pyplot.title(label, fontdict=None, loc=’center’, pad=None, **kwargs)
Adding title of the plot
  1. Adding X and Y labels: Sets the x-axis- and y-axis labels to describe the data.

matplotlib.pyplot.xlabel(xlabel, fontdict=None, labelpad=None, **kwargs)
matplotlib.pyplot.ylabel(ylabel, fontdict=None, labelpad=None, **kwargs)
Adding labels of x and y axis
  1. Setting limits and tick labels: Defines the range of values displayed on the axes and customizes the tick marks and their labels.

matplotlib.pyplot.xticks([x1, x2, x3], ['label1', 'label2', 'label3'])
matplotlib.pyplot.yticks([y1, y2, y3], ['label1', 'label2', 'label3'])
Setting limits and tick labels of the plot
  1. Adding legends: Creates a legend to identify different plot elements.

matplotlib.pyplot.legend(['label1', 'label2', 'label3'])
Adding legends of the plot

Line chart

In Matplotlib, a line chart is a graphic depiction of data points joined by straight lines. It is helpful for displaying correlations, trends, and patterns among continuous variables or over time.

# Importing required libraries
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
# Generation of variables
x=np.arange(0,10) #Array of range 0 to 9
y=x**3
# Printing the variables
print(x)
print(y)
plt.plot(x,y) # Function to plot
plt.title('Line Chart') # Function to give title
# Functions to give x and y labels
plt.xlabel('X-Axis')
plt.ylabel('Y-Axis')
# Functionn to show the graph
plt.show()
A line chart showing the relationship between X and Y
  • Line 18: This line generates a line plot, where x and y are plotted as continuous points connected by a line.

Multiple line chart

A multiple line chart in Matplotlib is a visualization technique used to compare trends of multiple datasets over a common x-axis.

# importing required libraries
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
# Generation of 1 set of variables
x = np.arange(0,11)
y = x**3
# Generation of 1 set of variables
x2 = np.arange(0,11)
y2 = (x**3)/2
# Printing all variables
print(x,y,x2,y2,sep="\n")
# "linewidth" is used to specify the width of the lines
# "color" is used to specify the colour of the lines
# "label"is used to specify the name of axes to represent in the lengend
plt.plot(x,y,color='r',label='first data', linewidth=5)
plt.plot(x2,y2,color='y',linewidth=5,label='second data')
plt.title('Multiline Chart')
# Uses the label attribute to display reference in legend
plt.ylabel('Y axis')
plt.xlabel('X axis')
# Shows the legend in the best postion with respect to the graph
plt.legend()
plt.show()
Generating and plotting multiline graphs with custom labels and line widths
  • Lines 21–22: These lines plot multiple line plots with additional customization: color ('r', 'y'), line width (linewidth=5), and a legend (label='first data', label='second data').

Bar chart

A bar chart is a data visualisation in which various categories are represented by rectangular bars or columns. Each bar’s length reflects the value it stands for.

# Importing required libraries
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
# Generation of variables
x = ["India",'USA',"Japan",'Australia','Italy']
y = [6,7,8,9,2]
# Printing the variables
print(x)
print(y)
plt.bar(x,y, label='Bars1', color ='r') # Function to plot
# Function to give x and y labels
plt.xlabel("Country")
plt.ylabel("Inflation Rate%")
# Function to give heading of the chart
plt.title("Bar Graph")
# Function to show the chart
plt.show()
Bar chart representation of inflation rates in different countries
  • Line 14: This line generates a bar chart with bars represented by the x and y data points. The color is set to red (color='r') and a label is added for reference in a legend.

Multiple bar chart

A multiple bar chart, also known as a grouped bar chart, is used to compare multiple categories across different groups. It’s particularly useful for visualizing comparisons between different groups or time periods.

# importing required libraries
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
# Generation of 1 set of variables
x = ["India",'USA',"Japan",'Australia','Italy']
y = [6,7,8,9,5]
# Generation of 2 set of variables
x2 = ["India",'USA',"Japan",'Australia','Italy']
y2 = [5,1,3,4,2]
# Printing all variables
print(x,y,x2,y2,sep="\n")
# Functions to plot
plt.bar(x,y, label='Inflation', color ='y')
plt.bar(x2,y2, label='Growth', color ='g')
# Functions to give x and y labels
plt.xlabel("Country")
plt.ylabel("Inflation & Growth Rate%")
plt.title("Multiple Bar Graph")
plt.legend()
plt.show()
Creating a comparative bar chart displaying inflation and growth rates for five countries
  • Line 18–19: These lines generate multiple bar charts with different sets of data. Each bar chart is given a label (label='Inflation', label='Growth') and a different color ('y', 'g').

Histogram

A histogram graphically represents the distribution of numerical data. It counts the number of data points in each bin after dividing the data into bins. The height of each bar in the histogram shows the frequency of data points within each bin.

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
# Generation of variable
stock_prices = [32,67,43,56,45,43,42,46,48,53,73,55,54,56,43,55,54,20,33,65,62,51,79,31,27]
# Function to show the chart
plt.figure(figsize = (8,5))
plt.hist(stock_prices, bins = 5)
Visualizing the distribution of stock prices across different bins
  • Line 11: This line creates a histogram of the stock_prices data. It divides the data into 5 bins (bins=5), showing the frequency distribution.

Scatter plot

Data points are represented graphically on a two-dimensional plane in a scatter plot. It’s helpful for illustrating how two numerical variables relate to one another. On the plot, each data point is represented by a dot, whose location is established by its x- and y- coordinates.

# Importing required libraries
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
# Generation of x and y variables
x = [1,2,3,4,5,6,7,8]
y = [5,2,4,2,1,4,5,2]
# Function to plot the graph
plt.scatter(x,y)
plt.xlabel('x')
plt.ylabel('y')
plt.title('Scatter Plot')
Scatter plot of two variables
  • Line 11: This line generates a scatter plot, where individual points are plotted based on their coordinates (x and y).

Pie chart

A pie chart is a circular diagram with slices that each show a different percentage of the total. It’s useful for visualizing categorical data and showing the relative sizes of different categories.

# Importing required libraries
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
# Collection of raw data
raw_data={'names':['Nick','Sani','John','Rubi','Maya'],
'jan_score':[123,124,125,126,128],
'feb_score':[23,24,25,27,29],
'march_score':[3,5,7,6,9]}
# Segregating the raw data into usuable form/variables
df=pd.DataFrame(raw_data,columns=['names','jan_score','feb_score','march_score'])
df['total_score']=df['jan_score']+df['feb_score']+df['march_score']
# Printing the data
print(df)
# Function to plot the graph
plt.pie(df['total_score'],labels=df['names'],autopct='%.2f%%')
plt.axis('equal')
plt.axis('equal')
plt.show()
Pie chart showing the distribution of total scores for each person
  • Line 20: This line creates a pie chart, where each slice represents the total_score of each individual, with the names labeled, and the percentage is displayed (autopct='%.2f%%').

Advanced plotting: Subplots

Using subplots, you can create several plots inside a single figure. This is helpful for visualizing several variables, comparing different datasets, and decomposing complex data into smaller, more focussed plots.

# Importing required libraries
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
# Defining the sixe og the figures
plt.figure(figsize=(10,10))
# Generation of variables
x = np.array([1,2,3,4,5,6,7,8])
y = np.array([5,2,4,2,1,4,5,2])
# Generating 4 subplots in form of 2x2 matrix
# In the line below the arguments of plt.subplot are as follows:
# 2- no. of rows
# 2- no. of columns
# 1- position in matrix
# Position (0,0)
plt.subplot(2,2,1)
plt.plot(x,y,'g')
plt.title('Sub Plot 1')
plt.xlabel('X-Axis')
plt.ylabel('Y-Axis')
# Position (0,1)
plt.subplot(2,2,2)
plt.plot(y,x,'b')
plt.title('Sub Plot 2')
plt.xlabel('X-Axis')
plt.ylabel('Y-Axis')
# Position (1,0)
plt.subplot(2,2,3)
plt.plot(y*2,x*2,'y')
plt.title('Sub Plot 3')
plt.xlabel('X-Axis')
plt.ylabel('Y-Axis')
# Position (1,1)
plt.subplot(2,2,4)
plt.plot(x*2,y*2,'m')
plt.title('Sub Plot 4')
plt.xlabel('X-Axis')
plt.ylabel('Y-Axis')
# Function for layout and spacing
plt.tight_layout(h_pad=5, w_pad=10)
4-Plot matrix visualization of different X-Y relationships in subplots
  • Line 19: This line creates a grid of subplots (2 rows and 2 columns) in the same figure. Each subplot contains a different plot, and plt.subplot() is used to specify the position of the plot within the grid.

Elevate your data science expertise with “Matplotlib for Python: Visually Represent Data with Plots.” Learn to craft stunning plots, manage axes, and create intricate layouts to showcase your data insights.

Conclusion

Matplotlib is a robust and flexible library for data visualization in Python. Its extensive customization options, compatibility with other libraries, and range of visualization types make it an essential tool for anyone working with data. Whether you’re a beginner exploring simple plots or an expert creating complex visualizations, Matplotlib has you covered.

Frequently asked questions

Haven’t found what you were looking for? Contact Us


What are the advantages of using Matplotlib?

It’s versatile, highly customizable, and integrates well with libraries like pandas and NumPy.


How do I create interactive visualizations with Matplotlib?

While Matplotlib is static, libraries like Plotly and Bokeh can extend interactivity.


Can Matplotlib create real-time visualizations?

Yes, using the animation module, you can create real-time or animated plots.


Which is better, seaborn or Matplotlib?

The choice between seaborn and Matplotlib depends on your specific needs and preferences. Seaborn is generally preferred for its ease of use and attractive default styles, while Matplotlib offers more flexibility and customization options. If you’re new to data visualization or prioritize quick and visually appealing plots, seaborn is a great choice. If you need fine-grained control over every aspect of your plots or require advanced customization, Matplotlib is a more suitable option.


Free Resources