In Pandas, the duplicated() function returns a Boolean series indicating duplicated rows of a dataframe.
The syntax for the duplicated() function is as follows:
DataFrame.duplicated(subset=None, keep='first')
The duplicated() function takes the following parameter values:
subset (optional): This represents a column label or sequence of labels denoting the column in which the duplicates are to be identified.keep (optional): This takes any of the values:"first": To mark any existing duplicate as True except for the first occurrence."last": To mark any existing duplicate as True except for the last occurrence."false": To mark all duplicates as True. The duplicated() function returns a Boolean Series for each duplicated row.
By default the
duplicated()function will returnFalsefor the first occurrence of a duplicated row and will returnTruefor the other occurrence. By setting thekeep="last", the first occurrence is set asTruewhile the last occurrence is set asFalse.
# A code to illustrate the duplicate() function# importing the pandas libraryimport pandas as pd# creating a dataframedf = pd.DataFrame([["THEO",1,1,3,"A"],["Theo",1,1,3,"A"],["THEO",1,1,3,"A"]],columns=list('ABCDE'))# printing the dataframeprint(df)print("\n")# to check for duplicate rowsprint(df.duplicated())print("/n")# setting first occurence as trueprint(df.duplicated(keep = "last"))print("\n")# getting duplicates on column Aprint(df.duplicated(subset = ["A"]))
pandas library.dataframe, df.dataframe.dataframe using the duplicated() function.True for any first occurrence of duplicated rows using the duplicate() function and passing "last" as the parameter value of keep."A".