Validating HTML tags is critical to the development process, especially when handling user-generated content, e.g., getting information via form submission by the users, which can contain vulnerabilities. Developers require a solution to validate and sanitize tags within user inputs, ensuring that the data processed by web applications is correctly formatted. Python provides various approaches to solve the tag validator problem, and in this Answer, we’ll explore implementing a custom tag validation function that utilizes a stack data structure.
Note: This problem won’t handle void tags, such as
<br>
or<img>
, which don’t require any closing tags.
The following are the steps to create a custom function:
Define a stack containing the encountered opening tags in our code to check.
Define an iterator variable to keep track of the current index.
Iterate through the string that contains our code.
If an opening tag is found (indicated by the presence of <
), append it to the stack.
If a closing tag is found instead, compare its value to the top of the stack. If they match, it’s a valid tag. Pop the top of the stack.
Process the entire string this way. If the stack is empty, return True
. If there are still items in the stack, it means the code isn’t correct, so we’ll return False
.
We can see a visual representation in the slides below:
The code for this problem is provided below:
def tagvalidator(str1):if not str1 or '<' not in str1:return Falsestack = []i = 0while i < len(str1):if str1[i] == '<' and str1[i + 1] != '/':j = i + 1while j < len(str1) and str1[j] != '>':j += 1if j < len(str1):tag = str1[i:j + 1] #Performing slicing operationif tag[-2] != '/':stack.append(tag)i = jelif str1[i] == '<' and str1[i + 1] == '/':j = i + 2while j < len(str1) and str1[j] != '>':j += 1if j < len(str1):end = str1[i:j + 1]if not stack or stack[-1][1:] != end[2:]:return Falsestack.pop()i = ji += 1return len(stack) == 0str1 = "<html><body><img/><div></div></body></html>"print(tagvalidator(str1))
This code is explained as follows:
Line 1: The definition of the tagvalidator
function, which takes str1
as a parameter.
Lines 2–3: The code to return False
if the string is empty or doesn’t contain any tags.
Line 4: Define stack
, which keeps track of the opened tags encountered in the code.
Line 7: Loop to iterate through each character in the string.
Lines 8–16: When an opening tag is encountered, we’ll append this tag to stack
. We identify the tag by searching for the closing character in the string. We also check if the tag is self-closing; if it isn’t, we append it to the stack.
Lines 16–26: When a closing tag is encountered, we extract the closing tag similarly. We check whether the closing tag matches the most recent opening tag in the stack, and if not, we will return False
. The most recent opening tag is popped from the stack if they match.
Line 29: When we’ve iterated through the entire string, check if stack
is empty. If it is, it’s valid and True
will be returned.
Lines 31–32: The code to call the tagvalidator
function contains the sample string str1
to check.
Similarly, we can also use the HTMLParser class, and feed method in Python for this purpose. The code is provided below:
from html.parser import HTMLParserclass TagValidator(HTMLParser):def __init__(self):super().__init__()self.stack = []def handle_starttag(self, tag, attrs):if not tag.endswith('/'):self.stack.append(tag)def handle_endtag(self, tag):if self.stack and self.stack[-1] == tag:self.stack.pop()else:print(f"Invalid Tag")exit()str1 = "<html><head/><body></body></html>"parser = TagValidator() # Creating an instance of our classparser.feed(str1) # Feed the HTML content (str1) to the parserif not parser.stack and ('<' in str1 or '>' in str1):print("Valid")else:print("Invalid")
This code can be explained as follows:
Line 1: We import the HTMLParser
class from the html.parser
module.
Line 3: The definition of the TagValidator
class which inherits from HTMLParser
.
Lines 4–6: Define the default constructor. It defines an empty stack.
Lines 8–10: Define the handle_starttag
method that appends the opening tag to the stack. It also checks if the tag is a self-closing tag.
Lines 12–17: Define the handle_endtag
method, which checks whether the stack is empty. If not, it checks whether the most recent tag in the stack matches the closing tag encountered in our code. If it does, pop the stack; else, return that it’s invalid.
Lines 19–21: Code to call out methods.
Lines 23–26: Code to print whether the code is “Valid” or “Invalid” after checking the contents of the stack. It also returns False
if the string is empty or doesn’t contain any tags.
Solve the quiz below to test your understanding of this problem:
In the “Tag Validator” problem, what is the primary purpose of using a stack data structure in Python?
To keep track of the order in which HTML tags are encountered.
To store the HTML attributes associated with each tag.
To count the total number of HTML tags in the document.
To perform mathematical calculations within the HTML content.
Free Resources