What is dropna() in pandas?

Key takeaways:

  • dropna() in pandas removes rows or columns with missing values (NaN).

  • The function enhances data quality by ensuring only complete data is used for analysis.

  • Parameters like axis, how, thresh, subset, and inplace allow flexible data cleaning.

  • The default behavior removes rows with any missing value.

  • Specifying axis=1 drops columns with NaN values.

  • Using how="all" removes rows or columns where all values are NaN.

  • The thresh parameter keeps rows or columns with minimum non-NaN values.

  • The subset parameter targets specific columns or rows for NaN removal.

  • The inplace=True option modifies the original DataFrame without returning a new one.

Missing value treatment is a very crucial part of data cleaning and tidying for integrity in data analysis, as these missing values create gaps that distort your outcomes and might lead to misinterpretation or even failure to draw a sound conclusion. pandas provides the dropna() function for effectively handling this; the function drops either the rows or columns that contain missing values, which, in other words, are NaN values. This makes statistical analysis and the predictive model of better quality, and makes data visualizations also cleaner. You can effectively apply dropna() to ensure your data is clean and analysis-ready.

Let’s explore how this function works with the help of a diagram.

Working of the dropna() function
Working of the dropna() function

In the above diagram, we apply the dropna() function to a DataFrame without specifying any parameters. This triggers the function’s default behavior, which, in this case, removes any row harbouring a NaN value. Consequently, a refreshed DataFrame is produced, devoid of the previously identified incomplete row. This visual example underscores the function’s straightforward yet effective approach to cleansing data, ensuring that only complete and reliable records are retained for analysis.

Syntax

df.dropna(axis=0, how='any', thresh=None, subset=None, inplace=False)

All the parameters passed to the function are optional; their default values are as provided in the above function syntax. Let’s try to understand the parameters we can pass to the function.

Parameters

The dropna() function in pandas is highly customizable through various parameters, enabling precise control over handling missing values. Below is a detailed breakdown of these parameters:

  • axis: Dictates the axis along which the function identifies and drops missing values.

    • axis=0 (default): Drops the rows with missing values.

    • axis=1: Drops the columns with missing values.

  • how: Defines the criterion for dropping rows or columns based on missing values.

    • any (default): Drops an axis if any NaN values are found.

    • all: Drops an axis only if all values are NaN.

  • thresh: Sets the minimum number of non-missing values required to retain a row or column.

  • subset: Allows specification of a subset of columns or rows to examine for missing values.

  • inplace: Decides whether to modify the DataFrame in place or return a new DataFrame.

    • True: Alters the original DataFrame.

    • False (default): Generates a new DataFrame with the changes applied.

These parameters give you the flexibility to tailor the dropna() function to fit the specific needs of your data cleansing process.

Coding examples

Let’s try to understand the dropna() function and each of its parameters with the help of code:

The import statement and dataset are hidden after the first code snippet in all subsequent code snippets to facilitate understanding of the code.

The axis parameter of dropna()

The code snippet below assumes axis=0, as we have not passed a default value.

import pandas as pd
dataset = {'Name': ["John", "Mary", "Kane", "Duke"],
'Height': [152, None, 137, 121],
'Weight': [66, 81, 54, None]}
df = pd.DataFrame(dataset)
print("Before cleaning:-")
print(df)
print('\n') #Printing an Empty line
new_df=df.dropna()
print("After cleaning:-")
print(new_df)

Explanation

Let’s understand the working of the above code:

  • Line 1: Importing the Pandas library in which dropna() is defined.

  • Lines 3–5: Making a dataset with some missing values.

  • Line 6: Converting the dataset to a Pandas DataFrame.

  • Line 9: Displaying the DataFrame with NaN values.

  • Line 11: Cleaning the original DataFrame by removing the row containing a NaN value.

  • Line 13: Displaying the cleaned dataset.

Now, let’s specify axis=1 in our function and observe its behaviour.

print("Before cleaning:-")
print(df)
print('\n') #Printing an Empty line
new_df=df.dropna(axis=1)
print("After cleaning:-")
print(new_df)

As we have set axis=1 in the function, the column containing the NaN has been dropped.

The how parameter of dropna()

Let’s specify how="any" and observe the output.

print("Before cleaning:-")
print(df)
print('\n') #Printing an Empty line
new_df=df.dropna(how="any")
print("After cleaning:-")
print(new_df)

As we specified how="any," an entire row is dropped if any NaN value is present. Therefore, two rows containing one NaN value each were removed.

Let’s specify how="all" and see the behavior change.

print("Before cleaning:-")
print(df)
print('\n') #Printing an Empty line
new_df=df.dropna(how="all")
print("After cleaning:-")
print(new_df)

As we specified how="all", a row is dropped only if all the values are NaN. As there was no row where all the values were NaN, the DataFrame remained unchanged.

The thresh parameter of dropna()

Let’s pass a specific value of the thresh parameter and observe the function.

print("Before cleaning:-")
print(df)
print('\n') #Printing an Empty line
new_df=df.dropna(thresh=3)
print("After cleaning:-")
print(new_df)

As we specified thresh=3 in the above code snippet, rows containing less than 3 non-NaN values are dropped. Therefore, rows at indices 1 and 3 were dropped as they contained only 2 non-NaN values.

We can further improve our understanding of this parameter by specifying thresh=4.

print("Before cleaning:-")
print(df)
print('\n') #Printing an Empty line
new_df=df.dropna(thresh=4)
print("After cleaning:-")
print(new_df)

As there is no row in our DataFrame with 4 non-NaN values, the entire DataFrame is dropped.

The subset parameter of dropna()

Now we will specify a subset of rows containing NaN values to be considered for removal. Let’s try with subset="Height".

print("Before cleaning:-")
print(df)
print('\n') #Printing an Empty line
new_df=df.dropna(subset="Height")
print("After cleaning:-")
print(new_df)

Explanation

As we specified subset="Height". Only the rows containing NaN values in the Height column were dropped. Although some rows contained NaN values in the Weight column as well, they were not dropped as Weight was not passed in the subset parameter.

The inplace parameter of dropna()

Let’s observe the default behaviour without passing the inplace parameter. The default behaviour considers inplace=0.

print("Before cleaning:-")
print(df)
print('\n') #Printing an Empty line
df.dropna()
print("After cleaning:-")
print(df)

No change occurred in the original DataFrame. This happened because we didn’t specify the inplace parameter to True.

Let’s specify inpace=True and observe the changes in the DataFrame.

print("Before cleaning:-")
print(df)
print('\n') #Printing an Empty line
df.dropna()
print("After cleaning:-")
print(df)

We notice that once we set inplace=True, the original DataFrame has been modified.

Conclusion

The dropna() function in pandas is an essential tool for handling missing values in datasets. It allows users to clean their data by removing rows or columns containing NaN values, making it easier to perform accurate analysis. The function is highly customizable, offering parameters like axis, how, thresh, subset, and inplace for tailored data cleaning based on specific needs. Through flexible options, dropna() provides an efficient solution for ensuring data integrity, enabling better analysis and predictions by working only with complete and reliable datasets.

Frequently asked questions

Haven’t found what you were looking for? Contact Us


What is the use of `dropna()` in pandas?

The use of dropna() in pandas is to remove rows or columns containing missing (NaN) values to clean the data for analysis.


How can you drop NaN in pandas?

Use df.dropna() to remove rows with NaN values. You can also specify axis=1 to drop columns with NaN.


What is the difference between `dropna()` and `notna()`?

dropna() removes rows/columns with NaN values, while notna() returns a DataFrame of boolean values, marking True for non-NaN entries and False for NaN.


Free Resources

Copyright ©2025 Educative, Inc. All rights reserved