LEX lexical analyzer generator

LEX is a tool that generates a lexical analyzer program for a given input string. It processes the given input string/file and transforms them into tokens. It is used by YACC programs to generate complete parsers.

There are many versions available for LEX, but the most popular is flex which is readily available on Linux systems as a part of the GNU compiler package.

Note: To learn more about the YACC parser generator, click here.

Parts of the LEX program

The layout of a LEX source program is:

  • Definitions

  • Rules

  • Auxiliary routines

A double modulus sign separates each section %%\% \%.

Definitions

The definitions are at the top of the LEX source file. It includes the regular definitions, the C, directives, and any global variable declarations. The code specific to C is placed within %{\% \{ and %}\%\}.

For example:

%{

#include <stdio.h>

int globalVariable = 0;

%}

Rules

This section may contain multiple rules. Each rule consists of:

  • A regular expression (name)

  • A piece of code (output action)

They execute whenever a token in the input stream matches with the grammar.

Auxiliary routines

This section includes functions that may be required in the rules section. Here, a function is written in regular C syntax. Most simple lexical analyzers only require the main() function.

The yytext keyword gives the current lexemeIt is a basic unit of meaning in a language consisting of a word or a group of words.. The generated code is placed into a function called yylex(). The main() function always calls the yylex() function.

Example

The given code is a LEX program that converts every decimal number in a given input to hexadecimal. The .l extension file contains the LEX program.

%{
#include <stdlib.h>
#include <stdio.h>
int count = 0;
%}
%%
[0-9]+ { int no = atoi(yytext);
printf("%x",no);
count++;
}
[\n] return 0;
%%
int main(void){
printf("Enter any number(s) to be converted to hexadecimal:\n");
yylex();
printf("\n");
return 0;
}

Explanation

The explanation of the file above is as follows:

  • Lines 1–5: We initialize the header files along with a global variable count.

  • Lines 7–13: We define the regular definition of the expected tokens, such as the decimal number input will contain digits 090-9.

  • Lines 15–20: We define the main() function, which calls the yylex() keyword.

Execution

Type the following commands in the terminal.

  • Type lex lexfile.l and press enter to compile the Lex file.

  • Type gcc -o output lex.yy.c and press enter to generate a C file for execution.

  • Type ./output to execute the program.

Example

Click the "Run" button and enter the commands provided. If you type 4646, it will be converted to 2E2E.

%{
#include <stdlib.h>
#include <stdio.h>
int count = 0;
%}

%%
[0-9]+ { int no = atoi(yytext);
	printf("%x",no);
	count++;
      }
[\n]      return 0;
%%

int main(void){
	printf("Enter any number(s) to be converted to hexadecimal:\n");
	yylex();
	printf("\n");
	return 0;
}
Execute the lex file

Free Resources

Copyright ©2025 Educative, Inc. All rights reserved