What is polars.LazyFrame.filter() function in Polars Python?

Key takeaways:

  • The LazyFrame.filter() function filters rows based on specified conditions without changing the order of the remaining rows.

  • It accepts parameters such as predicates for boolean evaluation and constraints to filter by specific column values.

  • The function returns a filtered DataFrame, which is useful for tasks like removing outliers or creating subsets of data.

Polars is a Rust-based DataFrame library that provides Python developers a fast and efficient alternative to pandas for large-scale data manipulation.

It improves performance for tasks like slicing, filtering, and transformations in Python and Rust environments. LazyFrame is a concept in Polars that enables deferred computationsThe type of computations in which the type of the output is inferred before the expression is evaluated. on a DataFrame. It allows operations to be postponed and executed only when needed, which helps optimize performance.

One of its key features is the filter() function. This function lets us apply conditions to the data and refine the dataset before execution. In this Answer, we will learn how to use the filter() function with LazyFrame.

Import the library

We use the following command to import the polars library:

import polars as pl

The LazyFrame.filter() method

The LazyFrame.filter() function filters the rows in the LazyFrame based on the provided constraints, but the default order of the rest of the rows remains intact. This means that the filtering operation does not alter the original order of the rows that meet the filtering criteria.

Syntax

Here’s the syntax of the LazyFrame.filter() function:

LazyFrame.filter(predicates, constraints)

Parameters

predicates: It evaluates the expression to a boolean series.

constraints: It use the name = value format to filter rows based on specific column values. Each constraint works like pl.col(name).eq(value) and is combined with other conditions using the & operator. This expression creates a boolean condition.

It returns True for rows where the specified column matches the given value, and False for all others. This is useful for filtering rows in a DataFrame based on the value in a specific column.

Return value

It returns a filtered DataFrame, which can be stored and utilized for further implementations. The use cases for these implementations can include removing outliers, subsetting data, and even training and testing.

Code example

Let’s look at a coding example of the LazyFrame.filter() function.

Note: Click the “Run” button to test the code.

import polars as pl
lf = pl.LazyFrame(
{
"id": [1, 2, 3, 4, 5],
"name": ["John", "David", "Zack", "Sara", "Cardy"],
"age": [23, 39, 27, 43, 22],
"country": ["Germany", "Austrailia", "Germany", "Pakistan", "London"],
"designation": ["Software Engineer", "Accountant", "Doctor", "Pilot", "Accountant"],
}
)
print(lf.filter(pl.col("age") < 30).collect())
print(lf.filter((pl.col("age") > 20) & (pl.col("country") == "Germany")).collect())
print(lf.filter((pl.col("age") < 30 ) | (pl.col("designation") == "Accountant")).collect())

Code explanation

Let’s learn about the implementation line by line:

  • Line 1: We import the polars library and give it the pl alias to make it easier to refer to in the code.

  • Lines 3–11: We initialize the LazyFrame containing five columns, id, "name", age, "country", and "designation" with a list of multiple values.

  • Line 14: We apply a filter on the LazyFrame, fetching all users whose age is less than 30.

  • Line 16: We apply a filter on the LazyFrame, now using multiple constraints, fetching the data of those users only whose age is greater than 20 and they live in Germany.

  • Line 18: We apply a filter on the LazyFrame, fetching all the users who are either under 30 or they are Accountant.

Conclusion

Polar's Rust-based DataFrame library and LazyFrame functionality provide an efficient alternative to pandas for large-scale data manipulation. The LazyFrame.filter() function allows filtering based on constraints while preserving row order and optimizing performance through deferred computations.

This makes Polars ideal for managing and analyzing large datasets efficiently.

Free Resources

Copyright ©2025 Educative, Inc. All rights reserved