pivot_table()
creates a spreadsheet-style pivot table as a DataFrame
in Pandas. The levels of the pivot table are stored in multiIndex objects, i.e., hierarchical indexes on the index and columns of the resultant DataFrame
.
pandas.pivot_table(data,values=None,index=None,columns=None,aggfunc=’mean’,fill_value=None,margins=False,dropna=True,margins_name='All',observed=False)
data
: This parameter represents the DataFrame
.values
: This is the column to aggregate. It is an optional parameter.index
: columns
: Note: Forindex
andcolumns
parameter, if the array is passed, its length must be the same as the length of data. The list can contain any type other than the list.
aggfunc
: numpy.mean
Note: If we pass the list of functions, the resultant pivot table contains hierarchical columns. The top level of these columns are the names of functions. These names are concluded from the objects of functions by themselves. If we pass dict, the key is the column to aggregate. The value is a function or list of functions.
fill_value
: None
margins
: False
dropna
: True
margins_name
: 'All'
observed
: False
True
, it displays the observed values for categorical grouper. If it is False
, it displays all the values for categorical groupers.sort
: True
pivot_table()
returns the DataFrame
, which is an Excel style pivot table.
Let us discuss an example regarding the pivot_table()
method.
import pandas as pdimport numpy as npdtf = pd.DataFrame({"cl1": ["pr1", "pr1", "pr1", "pr1", "pr1","pr2", "pr2", "pr2", "pr2"],"cl2": ["1", "1", "1", "2", "2","1", "1", "2", "2"],"cl3": ["half", "half", "half", "half","full", "full", "half", "half","full"],"cl4": [6, 3, 2, 6, 3, 2, 5, 1, 7],"cl5": [3, 4, 5, 2, 6, 5, 8, 8, 5]})print(dtf)# In the above code, we defined the dataframe.tbl1 = pd.pivot_table(dtf, values='cl4', index=['cl1', 'cl2'], columns=['cl3'], aggfunc=np.sum)print(tbl1)tbl2 = pd.pivot_table(dtf, values='cl4', index=['cl1', 'cl2'], columns=['cl3'], aggfunc=np.sum, fill_value=0)print(tbl2)tbl3 = pd.pivot_table(dtf, values=['cl4', 'cl5'], index=['cl1', 'cl3'], aggfunc={'cl4': np.mean,'cl5': np.mean})print(tbl3)tbl4 = pd.pivot_table(dtf, values=['cl4', 'cl5'], index=['cl1', 'cl3'], aggfunc={'cl4': np.mean,'cl5': [min, max, np.mean]})print(tbl4)
fill_value
parameter to fill the missing value.