The pandas library provides many functions for data manipulation. We can extract text from multiple rows using the groupby()
method. The groupby()
method performs the following:
join()
function: This function is used to get the string from the rows and columns.lambda
function: These anonymous functions do not require identification.# import pandas libraryimport pandas as pd# creating a DataFramedf = pd.DataFrame([(1, '2015', 10, 'A1'),(2, '2015', 30, 'A2'),(3, '2017', 35, 'A4'),(4, '2022', 45, 'A2'),(5, '2021', 44, 'A3'),(6, '2020', 50, 'A3'),], columns=['No.', 'Year', 'Marks', 'Assignment'])# print DataFrame on consoleprint(df)df['Assignment'] = df.groupby(['Year'])['Assignment'].transform(lambda x: ' '.join(x))# it will remove redundent entriesdf = df.drop_duplicates()# print updated DataFrameprint(df)
pd
.df.groupby(['Year'])['Assignment']
to group ['Year']
values with df
values. Next, the resultant returns the assignment column as ['Assignment']
is applied.transform(lambda x: ' '.join(x))
function to join exiting ['Assignment']
column to group by values. In pandas, transform()
is used to produce self-generated values after applying it. The ' '.join(x)
function adds a space character (' '
) parallel to each value.df.drop_duplicates()
function removes duplicate columns in specified DataFrame, df
. Next, we print the DataFrame, df
.