In Polars, the var()
function in the polars.dataframe
module is used to calculate the ddof
parameter.
Here’s the syntax of the dataframe.var()
function:
DataFrame.var(ddof: int = 1)
ddof
(Delta Degrees of Freedom): It represents the divisor used in the calculation, which is N - ddof
, where N
is the number of elements. The default value for ddof
is 1
.
We are familiar with the process of calculating variance, but let's review the method for its computation. The formula for calculating variance is as follows:
Where:
Let's take the array named alpha
with the values, [2,5,8,9,10]
, and calculate its variance using the formula:
The next step is to calculate the variance of alpha
, beta
and gamma
using the dataframe.var()
function:
import polars as pldf = pl.DataFrame({"alpha": [2, 5, 8, 9, 10],"beta": [8, 7, 6, 5, 4],"gamma": ["a", "b", "c", "d", "e"],})variance = df.var(ddof = 0)print(variance)
Now let’s break down the code shown above:
Lines 2-7: We create a DataFrame named df
with three columns: alpha
, beta
, and gamma
. The alpha
and beta
columns contain numerical data, while the gamma
column contains categorical data (strings).
Line 9: We calculate the variance of each numerical column in the DataFrame (alpha
and beta
) using the var
function. The ddof
parameter is set to 0
, which means the calculation is performed without using Bessel’s correction. Bessel's correction is used to adjust the sample variance to be an unbiased estimator of the population variance. Setting ddof=0
implies that the variance of the population is being calculated.
Line 10: We print the result.
Exercise: Try using default values and calculate variance.
The DataFrame.var
function in Polars provides a convenient and efficient way to calculate the variance of numerical columns within a DataFrame. The function supports an optional parameter, ddof
, allowing users to choose between calculating the population variance (ddof=0
) or the sample variance (ddof=1
).
By default, the function employs Bessel's correction (ddof=1
), ensuring unbiased estimates when working with samples. It's important to note that the function excludes non-numeric columns from the variance calculation, recognizing the statistical nature of the operation. Overall, the DataFrame.var
function in Polars contributes to the library's versatility in data manipulation and statistical analysis, enhancing the user's ability to extract meaningful insights from their datasets.
Free Resources