Binning in pandas is the process of grouping a continuous numerical variable into a smaller number of discrete bins or groups. Binning numerical columns is a common data preprocessing technique in data analysis and machine learning.
This can be useful for summarizing or visualizing data and identifying patterns or trends in the data.
Binning converts a continuous numerical variable into a categorical variable by dividing it into discrete intervals or bins. This can help simplify the data and be useful for various purposes such as visualization, analysis, and modeling.
cut()
function in pandasThe pandas library provides a convenient way of binning numerical columns using the cut()
function. This function takes a numerical column as input and divides it into equal-sized bins based on the specified number of bins or the bin edges provided.
The following example code illustrates how to bin a numerical column using the cut()
function.
import pandas as pd# Create sample DataFramedata = pd.DataFrame({'Unit': [5, 15, 20, 25, 30, 40, 45, 50]})# Bin the Age column into 3 equal-sized binsdata['UnitGroup'] = pd.cut(data['Unit'], bins=3)# Print the DataFrameprint(data)
Line 1: Import pandas
with the pd
alias.
Line 4: We create a data frame from a single column named Unit
.
Line 7: We use the cut()
function to create a new column named AgeGroup
by binning the Unit
column into 3 equal-sized bins.
Line 10: We print the new DataFrame which will have the UnitGroup
with the corresponding bin labels for each row in the Unit
column.
If you want to display labels instead of numerical bins, you can utilize the qcut()
function with labels
parameter to categorize the data into equal-frequency bins, each labeled accordingly.
data['UnitGroup'] = pd.cut(data['Unit'], bins=2, labels=['slightly high', 'Very High'])
qcut()
function in pandasQuantile cut, or qcut()
, is a function that divides a set of values into bins according to the sample quantiles. With this function, values are divided into bins of equal sizes, which is helpful when working with skewed data.
Here is an illustration of how to bin numerical columns in pandas using the qcut()
function:
import pandas as pd# create a DataFrame with a numerical columndata = pd.DataFrame({'Unit': [5, 15, 20, 25, 30, 40, 45, 50]})# bin the numerical column into 3 groupsdata['QcutBin'] = pd.qcut(data['Unit'], q=3)# print the new DataFrame with the binned columnprint(data)
Line 1: Import pandas
with the pd
alias.
Line 4: We create a data frame from a single column named Unit
.
Line 7: We use qcut()
to bin the values in that column into 3 groups, and store the results in a new column called QcutBin
. The q
parameter specifies the number of bins to create, in this case 3.
Line 10: We print the new data frame with the binned column.
In this Answer, we learned how to use the cut()
and qcut()
functions in pandas to group numerical columns into equal-frequency groups. We can convert continuous data into categorical data, which can be simpler to analyze and interpret, by effectively using these functions.
Free Resources