How to solve the tag validator problem in Python

Validating HTML tags is critical to the development process, especially when handling user-generated content, e.g., getting information via form submission by the users, which can contain vulnerabilities. Developers require a solution to validate and sanitize tags within user inputs, ensuring that the data processed by web applications is correctly formatted. Python provides various approaches to solve the tag validator problem, and in this Answer, we’ll explore implementing a custom tag validation function that utilizes a stack data structure.

Note: This problem won’t handle void tags, such as <br> or <img>, which don’t require any closing tags.

Algorithm

The following are the steps to create a custom function:

Define a stack containing the encountered opening tags in our code to check.
Define an iterator variable to keep track of the current index.
Iterate through the string that contains our code.
If an opening tag is found (indicated by the presence of <), append it to the stack.
If a closing tag is found instead, compare its value to the top of the stack. If they match, it’s a valid tag. Pop the top of the stack.
Process the entire string this way. If the stack is empty, return True. If there are still items in the stack, it means the code isn’t correct, so we’ll return False.

We can see a visual representation in the slides below:

def tagvalidator(str1):
    if not str1 or '<' not in str1:
        return False
    stack = []
    i = 0
    while i < len(str1):
        if str1[i] == '<' and str1[i + 1] != '/':
            j = i + 1
            while j < len(str1) and str1[j] != '>':
                j += 1
            if j < len(str1):
                tag = str1[i:j + 1] #Performing slicing operation
                if tag[-2] != '/':
                    stack.append(tag)
                i = j
        elif str1[i] == '<' and str1[i + 1] == '/':
            j = i + 2
            while j < len(str1) and str1[j] != '>':
                j += 1
            if j < len(str1):
                end = str1[i:j + 1]
                if not stack or stack[-1][1:] != end[2:]:
                    return False
                stack.pop()
                i = j
        i += 1
    
    return len(stack) == 0
str1 = "<html><body><img/><div></div></body></html>"
print(tagvalidator(str1))

This code is explained as follows:

Line 1: The definition of the tagvalidator function, which takes str1 as a parameter.
Lines 2–3: The code to return False if the string is empty or doesn’t contain any tags.
Line 4: Define stack, which keeps track of the opened tags encountered in the code.
Line 7: Loop to iterate through each character in the string.
Lines 8–16: When an opening tag is encountered, we’ll append this tag to stack. We identify the tag by searching for the closing character in the string. We also check if the tag is self-closing; if it isn’t, we append it to the stack.
Lines 16–26: When a closing tag is encountered, we extract the closing tag similarly. We check whether the closing tag matches the most recent opening tag in the stack, and if not, we will return False. The most recent opening tag is popped from the stack if they match.
Line 29: When we’ve iterated through the entire string, check if stack is empty. If it is, it’s valid and True will be returned.
Lines 31–32: The code to call the tagvalidator function contains the sample string str1 to check.

Similarly, we can also use the HTMLParser class, and feed method in Python for this purpose. The code is provided below:

from html.parser import HTMLParser
class TagValidator(HTMLParser):
    def __init__(self):
        super().__init__()
        self.stack = []
    def handle_starttag(self, tag, attrs):
        if not tag.endswith('/'):
            self.stack.append(tag)
    def handle_endtag(self, tag):
        if self.stack and self.stack[-1] == tag:
            self.stack.pop()
        else:
            print(f"Invalid Tag")
            exit()
str1 = "<html><head/><body></body></html>"
parser = TagValidator() # Creating an instance of our class
parser.feed(str1) # Feed the HTML content (str1) to the parser
if not parser.stack and ('<' in str1 or '>' in str1):
    print("Valid")
else:
    print("Invalid")

This code can be explained as follows:

Line 1: We import the HTMLParser class from the html.parser module.
Line 3: The definition of the TagValidator class which inherits from HTMLParser.
Lines 4–6: Define the default constructor. It defines an empty stack.
Lines 8–10: Define the handle_starttag method that appends the opening tag to the stack. It also checks if the tag is a self-closing tag.
Lines 12–17: Define the handle_endtag method, which checks whether the stack is empty. If not, it checks whether the most recent tag in the stack matches the closing tag encountered in our code. If it does, pop the stack; else, return that it’s invalid.
Lines 19–21: Code to call out methods.
Lines 23–26: Code to print whether the code is “Valid” or “Invalid” after checking the contents of the stack. It also returns False if the string is empty or doesn’t contain any tags.

Solve the quiz below to test your understanding of this problem:

Free AI Mock Interviews

Coding Interview

Coding PatternsFree Interview

Gain insights and practical experience with coding patterns through targeted MCQs and coding problems, designed to match and challenge your expertise level.

System Design

You TubeFree Interview

Learn to design a video streaming platform like YouTube by tackling functional and non-functional requirements, core components, and high-level to detailed design challenges.

Free Resources

How to solve the tag validator problem in Python

Algorithm

Code