How to bin numerical columns into groups using pandas

Binning in pandas is the process of grouping a continuous numerical variable into a smaller number of discrete bins or groups. Binning numerical columns is a common data preprocessing technique in data analysis and machine learning.

This can be useful for summarizing or visualizing data and identifying patterns or trends in the data.

Binning converts a continuous numerical variable into a categorical variable by dividing it into discrete intervals or bins. This can help simplify the data and be useful for various purposes such as visualization, analysis, and modeling.

The cut() function in pandas

The pandas library provides a convenient way of binning numerical columns using the cut() function. This function takes a numerical column as input and divides it into equal-sized bins based on the specified number of bins or the bin edges provided.

Example

The following example code illustrates how to bin a numerical column using the cut() function.

import pandas as pd
# Create sample DataFrame
data = pd.DataFrame({'Unit': [5, 15, 20, 25, 30, 40, 45, 50]})
# Bin the Age column into 3 equal-sized bins
data['UnitGroup'] = pd.cut(data['Unit'], bins=3)
# Print the DataFrame
print(data)

Explanation

  • Line 1: Import pandas with the pd alias.

  • Line 4: We create a data frame from a single column named Unit.

  • Line 7: We use the cut() function to create a new column named AgeGroup by binning the Unit column into 3 equal-sized bins.

  • Line 10: We print the new DataFrame which will have the UnitGroup with the corresponding bin labels for each row in the Unit column.

If you want to display labels instead of numerical bins, you can utilize the qcut() function with labels parameter to categorize the data into equal-frequency bins, each labeled accordingly.

data['UnitGroup'] = pd.cut(data['Unit'], bins=2, labels=['slightly high', 'Very High'])

The qcut() function in pandas

Quantile cut, or qcut(), is a function that divides a set of values into bins according to the sample quantiles. With this function, values are divided into bins of equal sizes, which is helpful when working with skewed data.

Example

Here is an illustration of how to bin numerical columns in pandas using the qcut() function:

import pandas as pd
# create a DataFrame with a numerical column
data = pd.DataFrame({'Unit': [5, 15, 20, 25, 30, 40, 45, 50]})
# bin the numerical column into 3 groups
data['QcutBin'] = pd.qcut(data['Unit'], q=3)
# print the new DataFrame with the binned column
print(data)

Explanation

  • Line 1: Import pandas with the pd alias.

  • Line 4: We create a data frame from a single column named Unit.

  • Line 7: We use qcut() to bin the values in that column into 3 groups, and store the results in a new column called QcutBin. The q parameter specifies the number of bins to create, in this case 3.

  • Line 10: We print the new data frame with the binned column.

Conclusion

In this Answer, we learned how to use the cut() and qcut() functions in pandas to group numerical columns into equal-frequency groups. We can convert continuous data into categorical data, which can be simpler to analyze and interpret, by effectively using these functions.

Free Resources

Copyright ©2025 Educative, Inc. All rights reserved