How to obtain the variance over a specified axis in pandas

Overview

The var() function in pandas obtains the variance of the values of a specified axis of a given DataFrame.

Mathematically, variance is defined as the measure of the spread between the values of a data set.

It takes the formula below:

S2 =Σ(xix)n1\frac{Σ(xi -x)}{n-1}

Where:

  • S2 = variance
  • xi = value of the dataset
  • x = the number of values in the dataset

In another context, the variance of a dataset is given as √standard deviation. That is, the square root of the standard deviation.

Syntax

The var() function takes the following syntax:

DataFrame.var(axis=NoDefault.no_default, skipna=True, numeric_only=None, **kwargs)
Syntax for the var() function in Pandas

Parameter values

The var() function takes the following optional parameter values:

  • axis: This represents the name of the row (designated as 0 or 'index') or the column (designated as 1 or columns) axis.
  • skipna: This takes a boolean value indicating whether NA or null values are to be excluded.
  • ddof: This takes an int that represents the delta degrees of freedom.
  • numeric_only: This takes a boolean value indicating whether to include only float, int, or boolean columns.
  • **kwargs: This is an additional keyword argument that can be passed to the function.

Return value

The var() function returns a DataFrame object holding the results.

Example

# A code to illustrate the var() function in Pandas
# Importing the pandas library
import pandas as pd
# Creating a DataFrame
df = pd.DataFrame([[1,2,3,4,5],
[1,7,5,9,0.5],
[3,11,13,14,12]],
columns=list('ABCDE'))
# Printing the DataFrame
print(df)
# Obtaining the median value vertically across rows
print(df.var())
# Obtaining the median value horizontally over columns
print(df.var(axis="columns"))

Explanation

  • Line 4: We import the pandas library.
  • Lines 7–10: We create a DataFrame, df.
  • Line 12: We print df.
  • Line 15: Using the var() function, we obtain the variance of the values that run downwards across the rows (axis 0). We print the result to the console.
  • Line 18: Using the var() function, we obtain the variance of values that run horizontally across columns (axis 1). We print the result to the console.

Free Resources