Matplotlib is a Python library that specializes in both static and interactive representations of data. This library contains a vast set of charts and plots.
In this answer, we will be focusing on a special type of plot called an error bar. Before we do that, let's learn the concept behind error bars.
Error bars are a visualization method that helps provide an efficient way of depicting how spread over or concentrated the data might be. Simply put, an error bar helps visualize the variability in a range of information.
Highlights of error bar depiction |
Uncertainty in the measurement |
The precision of a measurement |
Variability of data |
To show this variation in data, it would be well-suited to show a line from the actual point to the level of uncertainty. This is exactly what an error bar does. Error bars are represented as lines going through a point on a plot parallel to an axis of choice.
To fully grasp the concept of error plots, let's take an interesting scenario with different variations. Suppose we have different categories having a certain count of data elements in each. The standard deviation of each data element will then vary too. How do we show each category and the respective standard deviation measure?
For such a scenario, we can create bar plots to show our data and error bars to show the standard deviation for each bar.
import numpy as npimport matplotlib.pyplot as pltfrom matplotlib import cm
We start by importing the necessary libraries:
numpy
as np
for numerical operations
matplotlib.pyplot
as plt
for plotting
cm
from matplotlib
to access colormaps
categories = ['A', 'B', 'C', 'D', 'E',]count = [80, 150, 30, 90, 110]
We define the categories
using strings ['A', 'B', 'C', 'D', 'E']
and the count
of the values within each category as [80, 150, 30, 90, 110]
.
dataOfA = np.random.normal(count[0], 8, count[0])dataOfB = np.random.normal(count[1], 20, count[1])dataOfC = np.random.normal(count[2], 10, count[2])dataOfD = np.random.normal(count[3], 3, count[3])dataOfE = np.random.normal(count[4], 17, count[4])
We generate random data for each category using the np.random.normal()
function.
stdOfA = np.std(dataOfA)stdOfB = np.std(dataOfB)stdOfC = np.std(dataOfC)stdOfD = np.std(dataOfD)stdOfE = np.std(dataOfE)
We calculate the standard deviation for each set of generated data using the np.std()
function.
Note: The standard deviation provides us a a measure of the variability in the data.
Xcord = np.arange(len(categories))
For the visuals, we then create an array of x-coordinates using np.arange()
, equal to the number of categories.
errorLen = [stdOfA, stdOfB, stdOfC, stdOfD, stdOfE]
To depict the error bars, we create a list errorLen
to contain the standard deviation values for each category.
plt.style.use('dark_background')colorStyle = cm.Set2plt.bar(Xcord, count, color=colorStyle(np.arange(len(categories))), alpha=0.7, width=0.6)
We create a bar plot using plt.bar()
. For this purpose, we pass the x-coordinates, the count values, the colors based on the colorStyle
map, and other parameters like alpha
i.e. transparency, and width
.
plt.xlabel('Categories')plt.ylabel('Count')plt.title('Errorbars for barplots depicting various categories')plt.grid(True, color='gray', linestyle='dotted', linewidth=0.5, alpha=0.5)plt.show()
Yay, our brilliant plot is now ready! Let's add a few customizations like
How are different standard deviation values related to error bars?
Bringing it all together so we have an executable code.
import numpy as np import matplotlib.pyplot as plt from matplotlib import cm categories = ['A', 'B', 'C', 'D', 'E',] count = [80, 150, 30, 90, 110] dataOfA = np.random.normal(count[0], 8, count[0]) dataOfB = np.random.normal(count[1], 20, count[1]) dataOfC = np.random.normal(count[2], 10, count[2]) dataOfD = np.random.normal(count[3], 3, count[3]) dataOfE = np.random.normal(count[4], 17, count[4]) stdOfA = np.std(dataOfA) stdOfB = np.std(dataOfB) stdOfC = np.std(dataOfC) stdOfD = np.std(dataOfD) stdOfE = np.std(dataOfE) Xcord = np.arange(len(categories)) errorLen = [stdOfA, stdOfB, stdOfC, stdOfD, stdOfE] plt.style.use('dark_background') colorStyle = cm.Set2 plt.bar(Xcord, count, color = colorStyle(np.arange(len(categories))), alpha = 0.7, width = 0.6) plt.errorbar(Xcord, count, yerr = errorLen, fmt = 'none', color = 'white', capsize = 5, elinewidth = 1, capthick = 2) plt.xlabel('Categories') plt.ylabel('Count') plt.title('Errorbars for barplots depicting various categories') plt.grid(True, color = 'gray', linestyle = 'dotted', linewidth = 0.5, alpha = 0.5) plt.show()
Do not hesitate to experiment with the code, and click on "Run" when you're ready!
The exciting part of the whole code is finally here! The bar plots depicting the data and the error bars showing the standard deviation can be clearly interpreted from the output below.
Some categories have shorter error bars meaning that the variation in the data is minimal.
Some categories have comparatively longer error bars which show that the variation in the data or the standard deviation is more.
It is not necessary that we depict only standard deviation in our code. Error bars can be of various types. Let's discuss a few to keep our options open while coding!
The standard error is used to represent the variability of the sample means so that the precision of the estimate of the population mean can be obtained.
The standard deviation measures the dispersion of individual data points around the mean value.
Error bars can be used to display the minimum and maximum values found in the data.
Type of bar | Representation |
Standard error | These bars indicate the full extent of the observed values between different samples of a population. |
Standard deviation | Error bars representing the standard deviation show the spread of the data in a sample. |
Range | It is depicted as error bars extending above and below the mean till the maximum and minumum value. |
Note: Find similar answers for interesting Matplotlib queries below.
What does yerr
as a parameter help us with?
Free Resources