What is DataFrame.var() function in Polars?

In Polars, the var() function in the polars.dataframe module is used to calculate the varianceVariance is a measure of how much the values in a dataset deviate from the mean. of the columns in a DataFrame. The function provides the flexibility to specify the degree of freedom adjustment through the ddof parameter.

Syntax

Here’s the syntax of the dataframe.var() function:

DataFrame.var(ddof: int = 1)
Syntax of the dataframe.var() function

Parameters

  • ddof (Delta Degrees of Freedom): It represents the divisor used in the calculation, which is N - ddof, where N is the number of elements. The default value for ddof is 1.

Variance

We are familiar with the process of calculating variance, but let's review the method for its computation. The formula for calculating variance is as follows:

Where:

  • nn is the number of data points in the dataset.

  • xix_i​ represents each individual data point.

  • xˉ\bar{x} is the mean (average) of the dataset.

Let's take the array named alpha with the values, [2,5,8,9,10], and calculate its variance using the formula:

Let's take this example
Let's take this example
1 of 7

Code example

The next step is to calculate the variance of alpha, beta and gamma using the dataframe.var() function:

import polars as pl
df = pl.DataFrame(
{
"alpha": [2, 5, 8, 9, 10],
"beta": [8, 7, 6, 5, 4],
"gamma": ["a", "b", "c", "d", "e"],
})
variance = df.var(ddof = 0)
print(variance)

Code explanation

Now let’s break down the code shown above:

  • Lines 2-7: We create a DataFrame named df with three columns: alpha, beta, and gamma. The alpha and beta columns contain numerical data, while the gamma column contains categorical data (strings).

  • Line 9: We calculate the variance of each numerical column in the DataFrame (alpha and beta) using the var function. The ddof parameter is set to 0, which means the calculation is performed without using Bessel’s correction. Bessel's correction is used to adjust the sample variance to be an unbiased estimator of the population variance. Setting ddof=0 implies that the variance of the population is being calculated.

  • Line 10: We print the result.

Exercise: Try using default values and calculate variance.

Conclusion

The DataFrame.var function in Polars provides a convenient and efficient way to calculate the variance of numerical columns within a DataFrame. The function supports an optional parameter, ddof, allowing users to choose between calculating the population variance (ddof=0) or the sample variance (ddof=1).

By default, the function employs Bessel's correction (ddof=1), ensuring unbiased estimates when working with samples. It's important to note that the function excludes non-numeric columns from the variance calculation, recognizing the statistical nature of the operation. Overall, the DataFrame.var function in Polars contributes to the library's versatility in data manipulation and statistical analysis, enhancing the user's ability to extract meaningful insights from their datasets.

Free Resources

Copyright ©2025 Educative, Inc. All rights reserved