How to create a violin plot with Plotly Express in Python

Plotly Express is a Python library that allows us to create line plots quickly and easily, with customizable parameters and an interactive interface.

The violin plot is a type of data visualization that combines aspects of a box plot and a kernel density plot. It provides a concise summary of the distribution of a continuous variable while also displaying the individual data points.

Features of the violin plot

Some of the key features of the violin plot include:

  • Grouping: Violin plots can be grouped by a categorical variable, allowing us to compare distributions across different groups. This is done by specifying the color parameter in the violin function, which assigns different colors to the violins based on the specified categorical variable.

  • Orientation: Violin plots can be plotted horizontally or vertically. The orientation can be controlled using the orientation parameter in the violin function. By default, the orientation is set to 'v' for vertical, but we can change it to 'h' for horizontal.

  • Nested violin plots: We can create nested violin plots by specifying a second categorical variable using the facet_col or facet_row parameters. This allows us to create a grid of violins, where each category of the second categorical variable is nested within the primary categories.

  • Aggregation functions: Plotly Express provides various aggregation functions that can be used to summarize the data within each violin. By default, the violin plot displays the kernel density estimation, but we can also choose to show other summaries such as mean, median, quartiles, or count. The aggregation function can be specified using the violinmode parameter.

  • Box plot overlay: We can overlay a box plot on top of the violin plot to provide additional statistical information. This is achieved by setting the box parameter to True in the violin function. The box plot displays each category's quartiles, median, and potential outliers.

  • Data points: Individual data points can be displayed as markers within each violin, giving us a more detailed view of the data distribution. We can control the marker style, size, and color using the marker parameter in the violin function.

  • Styling and customization: Plotly Express provides extensive options for styling and customization. We can modify the violin plot's colors, line styles, fonts, and layout to match our preferences. Additionally, we can add titles, axis labels, and annotations to enhance the overall appearance and clarity of the plot.

Syntax

The violin function syntax typically follows this structure:

import plotly.express as px
fig = px.violin(df, x='category_column', y='continuous_column')
Syntax of the violin function

Parameters

Some commonly used parameters for creating violin plots with Plotly Express are as follows:

  • data: The DataFrame or data array containing the data to be plotted.

  • x: The column name or array-like values representing the categorical variable on the x-axis.

  • y: The column name or array-like values representing the continuous variable on the y-axis.

  • color: Optional parameter specifying a column name or array-like values representing a categorical variable used for grouping and assigning colors to the violins.

  • orientation: Specifies the orientation of the violins. Use 'v' for vertical (default) or 'h' for horizontal.

  • violinmode: Specifies the type of summary aggregation to display within the violins. Options include 'density' (default), 'count', 'probability', 'cumulative', 'mean', 'median', 'quartile', 'min', 'max', 'sum', and 'sd'. We can also pass a custom aggregation function.

  • box: Boolean parameter indicating whether to overlay a box plot on top of the violins. Set to True to include the box plot.

  • facet_col and facet_row: Optional parameters for creating nested violin plots based on a second categorical variable. facet_col creates a grid of violins with columns representing the second variable, while facet_row creates a grid with rows.

  • marker: Dictionary specifying the marker style for data points within the violins. We can customize the marker size, symbol, color, etc.

  • title, xaxis_title, yaxis_title: Strings for setting the plot title, x-axis title, and y-axis title, respectively.

Return type

The px.violin() function returns a Plotly figure object that can be displayed with fig.show(). The figure object contains all the information required to produce the 3D line plot, including the data, layout, and style.

Implementation

In the following playground, we create a violin plot using a sample dataset called iris provided by Plotly Express. Used attributes (species, and sepal_width) defined as follows:

  • species: The species attribute represents the species of an iris flower. It is a categorical variable that can take three different values: "setosa", "versicolor", and "virginica". Each value corresponds to a different species of iris flower.

  • sepal_width: The sepal_width attribute represents the width of the sepal of an iris flower. It is a continuous numerical variable that represents a physical measurement in millimeters. The sepal is a part of a flower, specifically the outer part of the flower that protects the inner reproductive organs.

cd /usercode && python3 main.py
python3 -m http.server 5000 > /dev/null 2>&1 &
Create a violin plot of the iris dataset

Explanation

The code above is explained in detail below:

  • Lines 2–3: Import the required libraries for the code: plotly.express as px for creating the violin plot, and pandas as pd for handling data in a DataFrame.

  • Line 6: Loads the iris dataset provided by Plotly Express into a pandas DataFrame called df. The px.data.iris() function retrieves the dataset.

  • Line 9: Prints the first five rows of the loaded dataset. The head() function retrieves the top rows of the DataFrame and print() displays the result in the console. It helps to quickly inspect the data and verify its structure.

  • Line 12: Create a violin plot using Plotly Express. It specifies the DataFrame (df) as the data source, species as the x-axis variable, sepal_width as the y-axis variable, box=True to overlay a box plot on top of the violins, and points="all" to display individual data points within the violins. The resulting plot is stored in the fig variable.

  • Lines 15–19: Update the layout of the plot using the update_layout() method of the fig object. The specified arguments set the plot's title, x-axis, and y-axis titles.

  • Line 22: Display the plot using the fig.show() method, which shows the interactive plot.

Conclusion

The violin plot in Plotly Express is a powerful tool for visualizing and comparing distributions of continuous variables across categories. It offers grouping, aggregation, and customization options, combining kernel density estimation, box plot, and data points. With its intuitive syntax and interactive capabilities, Plotly Express makes creating and customizing violin plots easy, aiding data exploration and pattern recognition. Violin plots are valuable for conveying insights in exploratory data analysis and communication with audiences, providing visually appealing representations of continuous variable distributions.

Unlock your potential: Plotly Graphing and Visualization series, all in one place!

To deepen your understanding of data visualization using Plotly, explore our comprehensive Answer series below:

Plotly express: quick and intuitive visualization

Plotly Graph Objects: Customization and advanced features

Free Resources

HowDev By Educative. Copyright ©2025 Educative, Inc. All rights reserved