Polars is a fast DataFrame
library implemented in Rust with bindings for Python. It is a data manipulation library used for processing large datasets. It is similar to pandas but optimized for performance and parallel operations processing, making it well-suited for big data processing tasks. Polars supports data from various sources, including CSV,
Note: We will use the version 3.6 of Python.
We can import the polars
library in our Python script or notebook, as shown below:
import polars as pl
We’ll go through the groupby()
method of the polars
library.
groupby()
methodThe groupby()
method, available in data manipulation libraries like pandas
and polars
, allows us to group rows of a DataFrame based on the unique values in one or more columns. With the help of the groupby()
method, we can group data according to categories and then independently apply functions to the categories.
Here is the syntax for using the groupby()
method:
# importing polarsimport polars as pldata = {"id": [1, 2, 3],"grade": ["A", "B", "B"]}df = pl.DataFrame(data)# grouping w.r.t columnfor name, data in df.groupby("grade"):print(name)print(data)
In the above code, we iterate through groups formed by the groupby
operation based on the unique values in the grade
column. In this case, the groups are formed for unique values A
and B
.
groupby()
methodThere’s a list of operations we can apply to the grouped data. Let’s see the examples of a few of them.
We can find the maximum of the grouped data using the groupby.max()
function of the polars
library. This way, we can reduce our groups to show only maximum values.
# importing polarsimport polars as pldata = {"x": [10, 20, 30, 40, 50, 60],"y": [0.1, 0.2, 0.5, 1.0, 2.0, 3.0],"z": [False, True, False, False, True, True],"w": ["Red", "Blue", "Red", "Green", "Green", "Blue"]}df = pl.DataFrame(data)# fetching the maximum valueresult = df.groupby("w", maintain_order=True).max()print(result)
We can find the minimum of the grouped data using the groupby.min()
function. This way, we can reduce our groups to show only minimum values.
# importing polarsimport polars as pldata = {"x": [10, 20, 30, 40, 50, 60],"y": [0.1, 0.2, 0.5, 1.0, 2.0, 3.0],"z": [False, True, False, False, True, True],"w": ["Red", "Blue", "Red", "Green", "Green", "Blue"]}df = pl.DataFrame(data)# fetching the minimum valueresult = df.groupby("w", maintain_order=True).min()print(result)
We can find the sum of the grouped data using the groupby.sum()
function. This way, we can reduce our groups to show the sum of the values.
# importing polarsimport polars as pldata = {"x": [10, 20, 30, 40, 50, 60],"y": [0.1, 0.2, 0.5, 1.0, 2.0, 3.0],"z": [False, True, False, False, True, True],"w": ["Red", "Blue", "Red", "Green", "Green", "Blue"]}df = pl.DataFrame(data)# fetching the sum of the valuesresult = df.groupby("w", maintain_order=True).sum()print(result)
We have explored a few examples, but there are many more methods like aggregate, mean, median, tail, quantile, etc. The DataFrame.groupby
method is a powerful function that allows us to group data efficiently and provide us with various operations on the grouped data.
Free Resources