In pandas, cross-tabulation is used to compute the cross-tabulation of two factors. With the help of cross-tabulation we can find the frequency distribution of the variables.
pandas.crosstab(index, columns, values = None, rownames = None, colnames = None, aggfunc = None, margins = False, margins_name = 'All', dropna = True, normalize = False)
To use cross-tabulation, we call the built-in function pandas.crosstab
. In the index
option, we pass the value that is used as a row, and in the column, we pass the values that will be used as the columns. The other options like values
, rownames
, and colnames
are optional and can be used whenever they are required. Otherwise, they will be processed in their built-in state.
For instance, consider a table with the following data:
Employee Name | Nationality | Gender |
Jerry | Germany | Male |
Harry | USA | Male |
Emma | USA | Female |
Amalia | China | Female |
Suppose we want to find out how many of the employees are males and females from each country. Pandas helps us do this via the crosstab()
function:
import numpy as npimport pandas as pdemployee_name = np.array(["Jerry", "Harry", "Emma", "Amalia"], dtype = object)nationality = np.array(["Germeny", "USA", "USA", "China"], dtype = object)gender = np.array(["Male", "Male", "Female", "Female"], dtype = object)print(pd.crosstab(nationality, gender, rownames = ['Nationality'], colnames = ['Gender']))
We see the results below:
Nationality | Male | Female |
USA | 1 | 1 |
Germany | 1 | 0 |
UK | 0 | 1 |
Line 4–6: Store data in the employe_name
, nationality,
and gender
variables respectively.
Line 7: Call the crosstab()
function to implement cross-tabulation on the data and pass the respective row and column.
Free Resources