How to make a boxplot in Polars using Matplotlib

Boxplot is a valuable tool for visualizing the distribution of data, and while the Polars library itself doesn’t offer direct support for creating boxplots, we can easily generate them in Polars by integrating with the Matplotlib library.

The boxplot() function

The boxplot() function in Matplotlib is used to create boxplots, a common way to visualize a dataset’s distribution and summary statistics.

Syntax

plt.boxplot(x, notch=None, sym=None, vert=None, whis=None, positions=None, widths=None, patch_artist=None)

Parameters

Here are the main parameters of the boxplot() function and their explanations:

  • x: This is the data we want to plot. It can be a single array or a list of arrays (one array per box in the boxplot).

  • notch: This creates a notched boxplot that displays a confidence interval around the median if it’s set to True.

  • sym: This is the symbol to indicate outliers. By default, it’s set to '+', but we can customize it to any symbol.

  • vert: This creates vertical boxplots if set to True (default),. If set to False, it creates horizontal boxplots.

  • whis: This is the whisker length as a proportion of the interquartile range (IQR). The default is 1.5, which is the standard definition. The line (whisker) will be drawn from the box to the minimum value within the range (Q1 - 1.5 * IQR) and from the box to the maximum value within the range (Q3 + 1.5 * IQR). Any data points that fall outside this range are treated as outliers and are displayed as individual points, not connected to the end of the whiskers.

  • positions: This specifies the positions of boxes on the x-axis. This can be a list of scalars or an array-like object.

  • widths: This specifies the width of the boxes. We can provide a list of scalars or an array-like object to customize box widths.

  • patch_artist: This function returns a list of patch objectsPatch objects are essential for creating informative and visually appealing plots in data visualization. that allow us to customize the appearance of the boxes, if it’s set to True.

These parameters allow us to customize various aspects of the boxplot to suit our visualization needs. Depending on the data and the specific insights we want to convey, we can adjust these parameters accordingly when calling plt.boxplot().

Code example

Here is an example code to demonstrate how to create a boxplot using Matplotlib with data from a Polars DataFrame:

# Import required libraries
import polars as pl
import matplotlib.pyplot as plt
# Create a sample Polars DataFrame
data = pl.DataFrame({
'Category': ['X', 'Y', 'Z', 'X', 'Y', 'Z','X', 'Y', 'Z', 'X', 'Y', 'Z'],
'Value': [5, 8, 12, 6, 9, 14, 7, 10, 16, 20, 4, 11]
})
# Extract the data we want to visualize
categories = data['Category'].to_list()
values = data['Value'].to_list()
# Create an empty list to hold the data for each category
category_data = []
# Extract and organize data by category
for category in set(categories):
category_values = [values[i] for i in range(len(categories)) if categories[i] == category]
category_data.append(category_values)
# Create a boxplot using Matplotlib
plt.figure(figsize=(8, 6))
plt.boxplot(category_data, labels=set(categories))
plt.xlabel('Category')
plt.ylabel('Value')
plt.title('Boxplot')
plt.show()

Explanation

In the above code:

  • Lines 6–9: We create a Polars DataFrame called data with two columns: Category and Value.

  • Lines 12–13: We extract the Category and Value columns into Python lists using to_list() for the purpose of organizing and plotting the data using Matplotlib.

  • Line 16: We create an empty list called category_data to store data for each category.

  • Lines 19–21: We iterate through the unique categories in the Category column and extract the corresponding Value data for each category.

  • Lines 24–28: We use Matplotlib to create a boxplot, passing the category_data list and labels as arguments. We set the title and axis labels.

  • Line 29: Finally, we display the boxplot using plt.show().

The code generates a boxplot that visualizes the distribution of Value data for each unique Category in the sample dataset. The x-axis represents the categories ('X', 'Y', 'Z'), and the y-axis represents the values. Each box in the plot represents a category, and within each box, we see a horizontal line indicating the median value, a box representing the IQR, and whiskers extending to the minimum and maximum values within a certain range (typically 1.5 times the IQR).

Free Resources

Copyright ©2025 Educative, Inc. All rights reserved