Polars is a powerful library for data manipulation and analysis. It is designed to process and analyze large datasets more quickly and efficiently.
DataFrame.mean()
functionThe DataFrame.mean()
function in Polars facilitates the computation of the arithmetic mean across columns or specific columns within a DataFrame. The arithmetic mean is computed by dividing the sum of all the numbers by the count of numbers.
Here is the syntax of the DataFrame.mean()
function:
DataFrame.mean(<axis>, <null_strategy>)
axis
: This is an optional parameter that specifies the axis along which the mean should be computed. Value 0
is a default value that computes the mean along the column, and value 1
computes the mean along the row.
null_strategy
: This is also an optional parameter that is only used if axis = 1
. This specifies how to handle null (missing) values during the computation. The value ignore
is the default value that only ignores the null values and computes the mean for the rest of the values in the row. The value propagate
will exclude all the rows with null values from the computation.
Note: We can also compute the mean of a specific column by specifying the column name in mean function like the following
DataFrame[<column_name>].mean()
. Remember that we can't useaxis
andnull_strategy
parameters while computing mean for specific columns.
The function returns a DataFrame, which contains the mean values calculated for each numeric column. The resulting DataFrame will have a single row containing the mean values for each column. If we specify the column name to compute the mean of the specific column, it will return a single mean value for that column.
Note: The non-numeric columns will be excluded from the computation.
Here's the coding example of the DataFrame.mean()
method to calculate the mean of numeric columns in Polars:
import polars as pldf = pl.DataFrame({"Product": ["Cookies", "Brownie", "Tortilla's wrap", "Pasta"],"Price": [10, 20, 35, 15],"Quantity": [200, None, 70, 30],})# Computing the mean of complete tableprint(df.mean())# Computing the mean of only one column, "Quantity"print("Mean of the column with Quantities: ", df["Quantity"].mean())# Computing the mean along the row# The null_strategy = ignore will compute the mean of rows excluding the null valuesprint("Mean along the row with the null_strategy = ignore: ", df.mean(axis=1, null_strategy = 'ignore'))# Computing the mean along the row# The null_strategy = propagate will exclude all the rows with the null values from the computationprint("Mean along the row with the null_strategy = propagate: ",df.mean(axis=1, null_strategy = 'propagate'))
Line 1: We import the polars
library as pl
.
Lines 2–9: We define our DataFrame as df
for the cafe with the product's name, price, and quantity.
Line 11: We use the df.mean()
function to print the mean of the complete table.
Line 14: We use the df.mean()
function with the column name Quantity
to only print the mean of the Quantity
column.
Line 18: We use the df.mean()
function with axis = 1
to compute the mean along the rows and null_strategy = 'ignore'
to exclude the null values from the computation.
Line 22: We use the df.mean()
function with axis = 1
to compute the mean along the rows and null_strategy = 'propagate'
to exclude the rows with null values.
Free Resources