Regex for phone number

Extracting phone numbers from a given text is a common challenge in text processing and data extraction. Regular expressions, aka regex, provide a comprehensive way of recognizing phone numbers in a string. In this Answer, we'll use regex in Python to define a pattern capable of identifying phone numbers in various formats.

Understanding phone number patterns

To accommodate the myriad of phone number formats, we'll define a regex pattern that will match the following in a given string:

  • An optional + character in the beginning

  • An optional country code (1–3 digits)

  • The area code (3 digits)

  • The first three digits of the phone number

  • The last four digits of the phone number

  • Any optional separators, such as '-', '.', ' ', or '(' after the country code, area code, or the first three digits of the phone number

Defining the regex

Here's a regex conforming to the above rules:

pattern = r'\b(?:\+?(\d{1,3}))?[-. (]*(\d{3})[-. )]*(\d{3})[-. ]*(\d{4})\b'

Let's break it down:

  • \b: Matches word boundary. This is to ensure that the match occurs at the beginning or end of a word.

  • (?:\+?(\d{1,3}))?: Matches an optional country code with an optional '+' sign in the beginning.

  • [-. (]*: Matches zero or more occurrences of the characters '-', '.', ' ', or '(' before the area code.

  • (\d{3}): Matches the area code (3 digits).

  • [-. )]*: Matches zero or more occurrences of the characters '-', '.', ' ', or ')' after the area code.

  • \d{3}): Matches the first three digits of the phone number.

  • [-. ]*: Matches zero or more occurrences of the characters '-', '.', or ' ' after the first three digits of the phone number.

  • (\d{4}): Matches the last four digits of the phone number.

  • \b: Matches word boundary. This is to ensure that the match occurs at the end of a word.

Testing the regex

Let's test our regex pattern to extract phone numbers from a string.

In the given code, we have defined some card numbers in the test string to make sure that the regex pattern only matches the phone numbers in the string and ignores the card numbers:

# importing the regex library
import re
# Creating a test string with phone and card numbers
testString = """
Here's a list of Phone numbers:
- +123-456-7890
- (456) 789-0123
- 789 012 3456
- 234.567.8901
- +1 (345) 678 9012
Here's a list of card numbers:
- 4024 0071 4058 8885
- 5454 5454 5454 5454
- 6011 1111 1111 1117
- 4539 1701 4786 0804
- 3714 4963 5398 4312
"""
# Defining the regex pattern
pattern = r'\b(?:\+?(\d{1,3}))?[-. (]*(\d{3})[-. )]*(\d{3})[-. ]*(\d{4})\b'
# Pattern matching to extarct phone numbers
phoneNumbers = re.findall(pattern, testString)
# Printing the phone numbers extracted from the test string
for match in phoneNumbers:
print("Phone Number:", ''.join(match))

Feel free to change the phone numbers in the test string and observe the output.

Explanation

  • Line 1: We import the Python re module that provides support for regular expressions.

  • Lines 3–17: We define a testString that contains dummy phone numbers in various formats, as well as some dummy card numbers.

  • Line 19: We define our regex pattern.

  • Line 21: We use the findall() method that returns a list of tuples of the matching patterns in the testString. The regex will match all the phone numbers in the string, but not the card numbers.

  • Lines 23–24: We iterate over the returned list and use the join() method to convert each tuple to a string and print it on the console.

Free Resources

Copyright ©2025 Educative, Inc. All rights reserved