How to split a DataFrame according to a boolean criterion

Overview

A dataframe can be split according to boolean criteria using the method called boolean masking.

Boolean masking or boolean indexing is the process in which subsets of the dataframe are extracted using a boolean vector.

Let’s understand this concept with an example.

DataFrame

Consider the following DataFrame.

import pandas as pd
records = [{"student_name":"Maya Wells","gpa":4.5,"country":"USA"},{"student_name":"Olympia Woods","gpa":5.9,"country":"Australia"},{"student_name":"Kenneth Oneal","gpa":8.5,"country":"Germany"},{"student_name":"Tobias Garcia","gpa":3.0,"country":"Ukraine"},{"student_name":"Micah Mcgee","gpa":9.0,"country":"Austria"},{"student_name":"John Mack","gpa":5.0,"country":"USA"},{"student_name":"Jack Daniels","gpa":6.7,"country":"Australia"},{"student_name":"Sarah Daniels","gpa":1.3,"country":"Australia"},{"student_name":"John Wick","gpa":10.0,"country":"USA"},{"student_name":"Zelensky","gpa":1.0,"country":"Ukraine"},{"student_name":"Jack Som","gpa":8.6,"country":"Austria"}]
df = pd.DataFrame(records)
print(df)

Explanation

  • Line 1: pandas module is imported.
  • Line 3: Sample records for the dataframe is defined.
  • Line 5: A pandas dataframe is created from the sample records.

The dataset is a student dataset that contains student name, their GPA, and the country they belong to.

Now if we want to split the dataset into students belonging to the USA and not belonging to the USA, we can use a boolean mask as follows:

mask = df['country'] == 'USA'

The mask above can be used to get all students from the USA. In order to get all students, not from the USA, we should negate the mask above i.e. ~mask.

Splitting a DataFrame

import pandas as pd
records = [{"student_name":"Maya Wells","gpa":4.5,"country":"USA"},{"student_name":"Olympia Woods","gpa":5.9,"country":"Australia"},{"student_name":"Kenneth Oneal","gpa":8.5,"country":"Germany"},{"student_name":"Tobias Garcia","gpa":3.0,"country":"Ukraine"},{"student_name":"Micah Mcgee","gpa":9.0,"country":"Austria"},{"student_name":"John Mack","gpa":5.0,"country":"USA"},{"student_name":"Jack Daniels","gpa":6.7,"country":"Australia"},{"student_name":"Sarah Daniels","gpa":1.3,"country":"Australia"},{"student_name":"John Wick","gpa":10.0,"country":"USA"},{"student_name":"Zelensky","gpa":1.0,"country":"Ukraine"},{"student_name":"Jack Som","gpa":8.6,"country":"Austria"}]
df = pd.DataFrame(records)
mask = df['country'] == 'USA'
students_from_usa = df[mask]
students_not_from_usa = df[~mask]
print("Students from USA\n", students_from_usa)
print("-"* 5)
print("Students not from USA\n", students_not_from_usa)

Explanation

  • Line 1: pandas module is imported.
  • Line 3: Sample records for the DataFrame is defined.
  • Line 5: A pandas DataFrame is created from the sample records.
  • Line 7: We define the mask as the country column equals USA.
  • Line 9: We get the students from the USA with the help of the mask.
  • Line 11: We get the students not from the USA by negating the mask.
New on Educative
Learn to Code
Learn any Language as a beginner
Develop a human edge in an AI powered world and learn to code with AI from our beginner friendly catalog
🏆 Leaderboard
Daily Coding Challenge
Solve a new coding challenge every day and climb the leaderboard

Free Resources

Copyright ©2025 Educative, Inc. All rights reserved