How to plot Andrews curves in pandas

Overview

Andrews curves visualize multidimensional/high-dimensional data by mapping each observation onto a function. This function is defined as follows:

  • The xx coefficients represent the values of each dimension.
  • The nn is linearly spaced between π-\pi and +π+\pi.

Andrews curves have been known to retain means, distance (up to a constant), and variances. As a result, Andrews curves represented by closely spaced functions imply that the accompanying data points will be closely spaced.

The andrews_curves() method in pandas

The andrews_curves() method in pandas is used to plot Andrews curves on a DataFrame. Each frame row represents a single curve.

Syntax

pandas.plotting.andrews_curves(frame, class_column, ax=None, samples=200, color=None, colormap=None, **kwargs)

Parameters

  • frame: This is the DataFrame to plot.
  • class_column: This is the name of the column containing class names.
  • ax: This is the matplotlib axes object.
  • samples: This corresponds to the number of points to plot in each curve.
  • color: This parameter can be a list or tuple of colors that can be used for different classes.
  • colormap: This can be a string or a matplotlib object where colors can be selected from the colormap.

Example

import pandas as pd
import matplotlib.pyplot as plt
df = pd.read_csv(
'https://raw.github.com/pandas-dev/'
'pandas/main/pandas/tests/io/data/csv/iris.csv'
)
print(df.head())
pd.plotting.andrews_curves(df, 'Name')
plt.show()

Explanation

  • Lines 1–2: We import the pandas and matplotlib packages.
  • Lines 4–7: We read the iris dataset into a DataFrame called df.
  • Line 8: The sample data from df is printed.
  • Line 9: We plot the Andrews curves using the andrews_curves() method. Here, the Name column in the dataset/DataFrame is a categorical column consisting of class names.
  • Line 10: We display the plotted graph.

Free Resources