How to perform cross-tabulation on a column in pandas

In pandas, cross-tabulation is used to compute the cross-tabulation of two factors. With the help of cross-tabulation we can find the frequency distribution of the variables.

Syntax

pandas.crosstab(index, columns, values = None, rownames = None, colnames = None, aggfunc = None, margins = False, margins_name = 'All', dropna = True, normalize = False)
Syntax of crosstab function

To use cross-tabulation, we call the built-in function pandas.crosstab. In the index option, we pass the value that is used as a row, and in the column, we pass the values that will be used as the columns. The other options like values, rownames, and colnames are optional and can be used whenever they are required. Otherwise, they will be processed in their built-in state.

Example

For instance, consider a table with the following data:

Employee Name

Nationality

Gender

Jerry

Germany

Male

Harry

USA

Male

Emma

USA

Female

Amalia

China

Female

Suppose we want to find out how many of the employees are males and females from each country. Pandas helps us do this via the crosstab() function:

import numpy as np
import pandas as pd
employee_name = np.array(["Jerry", "Harry", "Emma", "Amalia"], dtype = object)
nationality = np.array(["Germeny", "USA", "USA", "China"], dtype = object)
gender = np.array(["Male", "Male", "Female", "Female"], dtype = object)
print(pd.crosstab(nationality, gender, rownames = ['Nationality'], colnames = ['Gender']))

We see the results below:

Nationality

Male

Female

USA

1

1

Germany

1

0

UK

0

1

Explanation

  • Line 4–6: Store data in the employe_name, nationality, and gender variables respectively.

  • Line 7: Call the crosstab() function to implement cross-tabulation on the data and pass the respective row and column.

Free Resources

Copyright ©2025 Educative, Inc. All rights reserved