In Pandas, the duplicated()
function returns a Boolean series indicating duplicated rows of a dataframe
.
The syntax for the duplicated()
function is as follows:
DataFrame.duplicated(subset=None, keep='first')
The duplicated()
function takes the following parameter values:
subset
(optional): This represents a column label or sequence of labels denoting the column in which the duplicates are to be identified.keep
(optional): This takes any of the values:"first"
: To mark any existing duplicate as True
except for the first occurrence."last"
: To mark any existing duplicate as True
except for the last occurrence."false"
: To mark all duplicates as True
. The duplicated()
function returns a Boolean Series for each duplicated row.
By default the
duplicated()
function will returnFalse
for the first occurrence of a duplicated row and will returnTrue
for the other occurrence. By setting thekeep
="last"
, the first occurrence is set asTrue
while the last occurrence is set asFalse
.
# A code to illustrate the duplicate() function# importing the pandas libraryimport pandas as pd# creating a dataframedf = pd.DataFrame([["THEO",1,1,3,"A"],["Theo",1,1,3,"A"],["THEO",1,1,3,"A"]],columns=list('ABCDE'))# printing the dataframeprint(df)print("\n")# to check for duplicate rowsprint(df.duplicated())print("/n")# setting first occurence as trueprint(df.duplicated(keep = "last"))print("\n")# getting duplicates on column Aprint(df.duplicated(subset = ["A"]))
pandas
library.dataframe
, df
.dataframe
.dataframe
using the duplicated()
function.True
for any first occurrence of duplicated rows using the duplicate()
function and passing "last"
as the parameter value of keep
."A"
.