The Polars library is a high-performance data manipulation and analysis tool for the Python programming language. Built on Rust and designed for parallel processing, Polars provides a memory-efficient DataFrame structure, offering a powerful and efficient alternative to traditional data manipulation libraries. One common data manipulation task is adding a column that counts the rows. Polars provide the with_row_count()
function to accomplish it.
with_row_count()
functionThe DataFrame.with_row_count
function is used to count the number of rows. It adds a new column at index 0 of the Dataframe. Now, we will discuss syntax and coding examples to understand the concept of counting the rows in Polars.
Here is the syntax of the with_row_count
function:
DataFrame.with_row_count(name: str = 'row_nr', offset: int = 0)
The with_row_count()
function adds a new column to the DataFrame. This new column is a series of integers starting from the offset specified in the function call. The offset is incremented for each row, effectively creating a row count.
name
: The name
parameter specifies the name of the new column. By default, the name of the new column is row_nr
.
offset
: The offset
parameter specifies the starting point of the row count. The offset is set to 0
by default, meaning the row count will start from 0. We can also change the offset value from where we want to start the count.
Let’s see how to count rows with the following example (using default values):
import polars as pldf = pl.DataFrame({"a": [1, 3, 5, 6, 8, 10, 34],"b": [2, 4, 6, 5, 7, 87, 98],"c": [2, 3, 4, 5, 6, 7, 8]})row_count= df.with_row_count()print(row_count)
Lines 3–9: We create a DataFrame named df
with three column values named a
, b
and c
respectively.
Line 10: We assign the count of rows to the row_count
variable.
Line 11: We print the number of rows.
The output shows that a new column row_nr
is added, with the default starting value of offset 0
, which indicates the row number for each row in the DataFrame.
Now let’s see an example of using different name
and offset
values for the new column by changing the name as My_name
and setting the offset to start from 4
.
import polars as pldf = pl.DataFrame({"a": [1, 3, 5, 6, 8, 10],"b": [2, 4, 6, 5, 7, 87],"c": [2, 3, 4, 5, 6, 7]})row_count = df.with_row_count("My_name", 4)print(row_count)
The output shows that a new column My_name
is added, indicating the row number for each row in the DataFrame. This can be helpful for various analytical and data manipulation tasks where keeping track of row numbers is essential.
Free Resources