Key takeaways:
The
LazyFrame.filter()
function filters rows based on specified conditions without changing the order of the remaining rows.It accepts parameters such as
predicates
for boolean evaluation andconstraints
to filter by specific column values.The function returns a filtered
DataFrame
, which is useful for tasks like removing outliers or creating subsets of data.
Polars is a Rust-based DataFrame library that provides Python developers a fast and efficient alternative to pandas for large-scale data manipulation.
It improves performance for tasks like slicing, filtering, and transformations in Python and Rust environments. LazyFrame
is a concept in Polars that enables
One of its key features is the filter()
function. This function lets us apply conditions to the data and refine the dataset before execution. In this Answer, we will learn how to use the filter()
function with LazyFrame
.
We use the following command to import the polars
library:
import polars as pl
LazyFrame.filter()
methodThe LazyFrame.filter()
function filters the rows in the LazyFrame based on the provided constraints, but the default order of the rest of the rows remains intact. This means that the filtering operation does not alter the original order of the rows that meet the filtering criteria.
Here’s the syntax of the LazyFrame.filter()
function:
LazyFrame.filter(predicates, constraints)
predicates
: It evaluates the expression to a boolean series.
constraints
: It use the name = value
format to filter rows based on specific column values. Each constraint works like pl.col(name).eq(value)
and is combined with other conditions using the &
operator. This expression creates a boolean condition.
It returns True
for rows where the specified column matches the given value, and False
for all others. This is useful for filtering rows in a DataFrame based on the value in a specific column.
It returns a filtered DataFrame, which can be stored and utilized for further implementations. The use cases for these implementations can include removing outliers, subsetting data, and even training and testing.
Let’s look at a coding example of the LazyFrame.filter()
function.
Note: Click the “Run” button to test the code.
import polars as pllf = pl.LazyFrame({"id": [1, 2, 3, 4, 5],"name": ["John", "David", "Zack", "Sara", "Cardy"],"age": [23, 39, 27, 43, 22],"country": ["Germany", "Austrailia", "Germany", "Pakistan", "London"],"designation": ["Software Engineer", "Accountant", "Doctor", "Pilot", "Accountant"],})print(lf.filter(pl.col("age") < 30).collect())print(lf.filter((pl.col("age") > 20) & (pl.col("country") == "Germany")).collect())print(lf.filter((pl.col("age") < 30 ) | (pl.col("designation") == "Accountant")).collect())
Let’s learn about the implementation line by line:
Line 1: We import the polars
library and give it the pl
alias to make it easier to refer to in the code.
Lines 3–11: We initialize the LazyFrame containing five columns, id
, "name"
, age
, "country"
, and "designation"
with a list of multiple values.
Line 14: We apply a filter on the LazyFrame, fetching all users whose age
is less than 30
.
Line 16: We apply a filter on the LazyFrame, now using multiple constraints, fetching the data of those users only whose age
is greater than 20
and they live in Germany
.
Line 18: We apply a filter on the LazyFrame, fetching all the users who are either under 30
or they are Accountant
.
Polar's Rust-based DataFrame library and LazyFrame functionality provide an efficient alternative to pandas for large-scale data manipulation. The LazyFrame.filter()
function allows filtering based on constraints while preserving row order and optimizing performance through deferred computations.
This makes Polars ideal for managing and analyzing large datasets efficiently.
Free Resources