Extracting phone numbers from a given text is a common challenge in text processing and data extraction. Regular expressions, aka regex, provide a comprehensive way of recognizing phone numbers in a string. In this Answer, we'll use regex in Python to define a pattern capable of identifying phone numbers in various formats.
To accommodate the myriad of phone number formats, we'll define a regex pattern that will match the following in a given string:
An optional + character in the beginning
An optional country code (1–3 digits)
The area code (3 digits)
The first three digits of the phone number
The last four digits of the phone number
Any optional separators, such as '-', '.', ' ', or '(' after the country code, area code, or the first three digits of the phone number
Here's a regex conforming to the above rules:
pattern = r'\b(?:\+?(\d{1,3}))?[-. (]*(\d{3})[-. )]*(\d{3})[-. ]*(\d{4})\b'
Let's break it down:
\b
: Matches word boundary. This is to ensure that the match occurs at the beginning or end of a word.
(?:\+?(\d{1,3}))?
: Matches an optional country code with an optional '+' sign in the beginning.
[-. (]*
: Matches zero or more occurrences of the characters '-', '.', ' ', or '(' before the area code.
(\d{3})
: Matches the area code (3 digits).
[-. )]*
: Matches zero or more occurrences of the characters '-', '.', ' ', or ')' after the area code.
\d{3})
: Matches the first three digits of the phone number.
[-. ]*
: Matches zero or more occurrences of the characters '-', '.', or ' ' after the first three digits of the phone number.
(\d{4})
: Matches the last four digits of the phone number.
\b
: Matches word boundary. This is to ensure that the match occurs at the end of a word.
Let's test our regex pattern to extract phone numbers from a string.
In the given code, we have defined some card numbers in the test string to make sure that the regex pattern only matches the phone numbers in the string and ignores the card numbers:
# importing the regex libraryimport re# Creating a test string with phone and card numberstestString = """Here's a list of Phone numbers:- +123-456-7890- (456) 789-0123- 789 012 3456- 234.567.8901- +1 (345) 678 9012Here's a list of card numbers:- 4024 0071 4058 8885- 5454 5454 5454 5454- 6011 1111 1111 1117- 4539 1701 4786 0804- 3714 4963 5398 4312"""# Defining the regex patternpattern = r'\b(?:\+?(\d{1,3}))?[-. (]*(\d{3})[-. )]*(\d{3})[-. ]*(\d{4})\b'# Pattern matching to extarct phone numbersphoneNumbers = re.findall(pattern, testString)# Printing the phone numbers extracted from the test stringfor match in phoneNumbers:print("Phone Number:", ''.join(match))
Feel free to change the phone numbers in the test string and observe the output.
Line 1: We import the Python re
module that provides support for regular expressions.
Lines 3–17: We define a testString
that contains dummy phone numbers in various formats, as well as some dummy card numbers.
Line 19: We define our regex pattern.
Line 21: We use the findall()
method that returns a list of tuples of the matching patterns in the testString
. The regex will match all the phone numbers in the string, but not the card numbers.
Lines 23–24: We iterate over the returned list and use the join()
method to convert each tuple to a string and print it on the console.
Free Resources