Pandas is an open-source Python library mostly used in data sciences, data analysis, and machine learning projects. In machine learning tasks, it helps clean and prepare the data to pass them into machine learning models. The most common operations performed using pandas are filling in missing values, manipulating the data, loading data from files, and visualizing data.
In this Answer, we will explore how to select a DataFrame with specific values by applying a filter on the column values.
We can filter DataFrame by column values using the following methods:
Boolean indexing: We can create a condition (Boolean expression). Thee DataFrame will then keep the rows where the condition was true and discard the rows where it was false.
filtered_df = df[df['Age'] > 18]
qeury()
method: We can pass a query in string format in the query
method. It is mainly used for complex filtering.
filtered_df = df.query('Age > 25 and Country == "USA"')
loc[]
accessor: We can access the column values of a DataFrame directly by applying conditions in the loc[]
accessor.
filtered_df = df.loc[df['Age'] > 25]
isin()
method: We can filter rows of a DataFrame based on whether the values in a specified column are present in a given list or array.
ages = [25, 30, 35]filtered_df = df[df['Age'].isin(ages)]
To learn how we can apply a filter on the column values, let's first create a data example. Consider that we have the following 2-D data present in a pandas DataFrame. The data contains the person's name, country, and age:
Name | Country | Age |
Albert | USA | 12 |
James | USA | 14 |
Alex | USA | 20 |
Bob | Canada | 25 |
Sara | Canada | 29 |
Bill | Germanny | 45 |
Daniel | Germany | 32 |
John | Egypt | 60 |
In this example, we aim to infer two separate pieces of information from the data:
The DataFrame will provide us with the following person's information if we find the person living in the USA:
Name | Country | Age |
Albert | USA | 12 |
James | USA | 14 |
Alex | USA | 20 |
The DataFrame will provide us with the following person's information if we find the person older than 18 and younger than 40:
Name | Country | Age |
Alex | USA | 20 |
Bob | Canada | 25 |
Sara | Canada | 29 |
Daniel | Germany | 32 |
In the coming sections, we will load the example data in a pandas DataFrame and apply the filters to the data.
First of all, let's create a pandas DataFrame and load it with some hard-coded data:
import pandas as pd person_data = pd.DataFrame({ "Name": ["Albert", "James", "Alex", "Bob", "Sara", "Bill", "Daniel", "John"], "Country": ["USA", "USA", "USA", "canada", "Canada", "Germany", "Germany", "Egypt"] , "Age": [12, 14, 20, 25, 29, 45, 32, 60] }) print(person_data)
Code explanation
Line 1: We import pandas
and give it the name pd
.
Lines 3–7: We create a DataFrame using pd.DataFrame
function, store it in the person_data
variable, and pass the person's information in the curly brackets {}
.
Now that we have created the DataFrame, we can move towards applying filters on column values.
import pandas as pd person_data = pd.DataFrame({ "Name": ["Albert", "James", "Alex", "Bob", "Sara", "Bill", "Daniel", "John"], "Country": ["USA", "USA", "USA", "canada", "Canada", "Germany", "Germany", "Egypt"] , "Age": [12, 14, 20, 25, 29, 45, 32, 60] }) age = [25,60] age_greater_than_25 = person_data.loc[person_data['Age'] > 25] germany_and_older_than_25 = person_data.query('Age > 25 and Country == "Germany"') age_25_and_60 = person_data[person_data["Age"].isin(age)] living_in_USA = person_data[person_data["Country"] == "USA"] age_filter = person_data[(person_data["Age"] > 18) & (person_data["Age"] < 40)] print("Age greater than 25:") print(age_greater_than_25) print("Germany citizen and age greater than 25:") print(germany_and_older_than_25) print("Age 25 and 60:") print(age_25_and_60) print("Person living in USA:") print(living_in_USA) print("Person older than 18 and younger than 40:") print(age_filter)
Line 11: We use the loc
accessor and pass the condition person_data['Age'] > 25
in the square brackets []
.
Line 12: We use the query
method and pass the condition Age > 25 and Country == "Germany"
in the method.
Line 13: We use the isin()
method and pass the age
list. The result of the isin()
method is stored in the age_25_and_60
variable.
Line 14: We use the person_data
variable and apply the filter person_data["Country"] == "USA"
in the square brackets []
. The result of applying the filter is stored in the living_in_USA
variable.
Line 15: We again use the person_data
variable and apply the filter (person_data["Age"] > 18) & (person_data["Age"] < 40)
inside the square brackets []
. The result of applying the filter is stored in the age_filter
variable.
Lines 17–30: We use the print
statement and print out the filtered data.
Pandas provide us with the capability of filtering out DataFrame values by applying filters on the column values. This can help us to extract information and get insights from the data.
Free Resources