How to filter valid email addresses from a series in pandas

pandas’s library, combined with Regex Library, allows for the filtering of emails to check their validity.

For details about Regex’s syntax, please visit here.

Method 1

In this method, we iterate over the input_data series and match each entry with the valid email pattern using regex match(). The function returns True if the exact match is found; otherwise, it returns False. Then, the resultant values are mapped to the input_data using the map() function.

#importing pandas and regex libraries
import pandas as pd
import re as regex
#initialiing pandas series
input_data = pd.Series(['educative.io', 'jobs@educative.io', 'edpresso@educative.io'])
#initializing valid email pattern (may vary)
pattern ='[0-9a-zA-Z._%+-]+@[0-9a-zA-Z.-]+\\.[A-Za-z]{2,4}'
#mapping valid emails
mapped_result = input_data.map(lambda i: bool(regex.match(pattern, i)))
print("Valid Emails are: ")
print(input_data[mapped_result])

Method 2

In this method, we use str findall() to find all matching occurrences of the valid email pattern in input_data.

#importing pandas library
import pandas as pd
#initializing pandas series
input_data = pd.Series(['educative.io', 'jobs@educative.io', 'edpresso@educative.io'])
#initializing valid email pattern (may vary)
pattern ='[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\\.[A-Za-z]{2,4}'
#finding all matching occurrences
result = input_data.str.findall(pattern)
print("Valid Emails are: ")
print([i for i in result if len(i) > 0])

Method 3

In this method, we use regex findall() to find all matching occurrences of the valid email pattern in input_data.

#importing pandas and regex libraries
import pandas as pd
import re as regex
#initializing pandas series
input_data = pd.Series(['educative.io', 'jobs@educative.io', 'edpresso@educative.io'])
#initializing valid email pattern (may vary)
pattern ='[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\\.[A-Za-z]{2,4}'
#finding all matching occurrences
result = [regex.findall(pattern, email) for email in input_data]
print("Valid Emails are: ")
print([i for i in result if len(i) > 0])

Free Resources