How to create a lexer using PLY in Python

PLY is a Python library that, in essence, is an implementation of popular parsing tools like lex and yacc. The library consists of two modules lex.py and yacc.py, which create a lexer and a parser, respectively. In this Answer, we’ll explore the lex.py module to create a lexical analyzer for a simple calculator.

Creating a lexer

For simplicity, we’ll create a lexer that can tokenize a string of numbers and basic arithmetic operators (+, -, *, /) and print an error if it encounters any other character. Let’s start by importing the library:

import ply.lex as lex
# Define token names
tokens = (
    'PLUS',
    'MINUS',
    'MULTIPLY',
    'DIVIDE',
    'INTEGER'
)
# Define regex for each token
t_PLUS = r'\+'
t_MINUS = r'\-'
t_MULTIPLY = r'\*'
t_DIVIDE = r'/'
def t_INTEGER(t):
    r'\d+'
    t.value = int(t.value)
    return t
# Define regex to ignore whitespace
t_ignore = ' \t'
# Define a rule to handle errors while lexing
def t_error(t):
    print(f"Illegal character '{t.value[0]}'")
    t.lexer.skip(1)
lexer = lex.lex()
test = '1 + 1 * 3 - 4 / 2'
lexer.input(test)
while True:
    token = lexer.token()
    if not token:
        break
    print(token)

Free AI Mock Interviews

Coding Interview

Coding PatternsFree Interview

Gain insights and practical experience with coding patterns through targeted MCQs and coding problems, designed to match and challenge your expertise level.

System Design

You TubeFree Interview

Learn to design a video streaming platform like YouTube by tackling functional and non-functional requirements, core components, and high-level to detailed design challenges.

Free Resources

How to create a lexer using PLY in Python

Creating a lexer

Defining tokens

Defining token rules

Lexer execution and token generation

Testing the lexer