A DataFrame
is a commonly used 2-dimensional data structure. It is a table that consists of columns and rows, and is used primarily as an object in the pandas
library.
pandas
libraryWe use the following statement to call the pandas
library.
import pandas as pd
A DataFrame
can be formed as shown below. The one we use contains countries that have been put in different groups and are given a different a_score
and b_score
.
Both the scores are imaginary values for this example.
import pandas as pda_score = [4, 5, 7, 8, 2, 3, 1, 6, 9, 10]b_score = [1, 2, 3, 4, 5, 6, 7, 10, 8, 9]country = ['Pakistan', 'USA', 'Canada', 'Brazil', 'India', 'Beligium', 'Malaysia', 'Peru', 'England', 'Scotland']groups = ['A','A','B','A','B','B','C','A','C','C']df = pd.DataFrame({'group':groups, 'country':country, 'a_score':a_score, 'b_score':b_score})print(df)
The nunique()
function counts the number of unique entries in a column of a dataframe
.
It is useful in situations where the number of categories is unknown beforehand.
The function prototype is as follows.
mynumber = df.nunique()
It does not take any parameters.
This method returns the number of entries in the requested columns.
The following example prints the number of unique entries in a_score
. Next, it prints a number of unique entries in all columns.
import pandas as pda_score = [4, 5, 7, 4, 2, 4, 1, 1, 5, 10]b_score = [1, 2, 3, 4, 3, 6, 4, 10, 1, 9]country = ['Pakistan', 'USA', 'Canada', 'Brazil', 'India', 'Beligium', 'Malaysia', 'Peru', 'England', 'Scotland']groups = ['A','A','B','A','B','B','C','A','C','C']df = pd.DataFrame({'group':groups, 'country':country, 'a_score':a_score, 'b_score':b_score})print("the main dataframe")print(df)print("")print("unique entries in a_score = ")print(df.a_score.nunique())print("")print("unique entries in all columns = ")print(df.nunique())