Simplifying type checking and data validation using Pydantic

Type checking verifies that the data types used in a computer program are correct. It’s a critical part of software development, as it helps prevent errors and improve code quality. Data validation is the process of ensuring that data is accurate, complete, and consistent. Validating data before using it in any application is important, as invalid data can lead to errors and incorrect results.

Pydantic is a Python library that provides a powerful and intuitive way to perform type-checking and data validation. It leverages Python’s type annotations to define and validate data structures, making it easy to ensure that data is consistent and correct. Pydantic can be installed with the following terminal command:

Type checking

Python’s type annotations are a way to hint to the type checker what type of data is expected for a particular variable or function parameter. Pydantic takes this one step further by allowing us to define custom constraints on the given data structures. For example, we can specify that a field must be a non-empty string, a positive integer, or a list of unique values.

To use Pydantic for type checking, we simply create a Pydantic model class and define the fields we need. The type annotations for the fields will specify the expected types. For example, the following code defines a model class for a user:

Here’s a breakdown of the code:

Line 1: We import the BaseModel class from the pydantic library.
Lines 3–6: We define a User class that inherits from the BaseModel class. This class defines the structure of a user object, which contains name, age, and email with their respective data types.
Lines 8–12: We create a user_data dictionary containing the data for a new user.
Line 14: We validate the user_data dictionary using the User.model_validate() class method. This method returns a User object named user if the data is valid, or raises a ValidationError exception if the data is invalid.
Lines 16–18: We print the values of the user object’s name, age, and email properties.

Now, when we correct the input value for age and use an integer, the code should work. Here’s the updated code:

Pydantic makes type checking easier, faster, and more efficient than manual type checking.

Data validation

Pydantic can validate data in a number of ways, including range checking, regular expression matching, uniqueness checking, and custom validation.

Range checking

Pydantic range checking is a feature that allows us to validate data against a specified range of values. This can be done by using the Field() and constr class decorators to manage integer value and string length.

The min_length() and max_length() keyword arguments are used for the constr() class decorator to define the range of the string length. The ge (greater than) and le (less than) keyword arguments are used for the Field() class decorator to define the integer value bracket. Here’s an example of using Pydantic range checking to validate the name and age field of a class:

Here’s an explanation:

Line 1: We import the BaseModel, contr, and Field classes from the pydantic library.
Line 3: We define a User class that inherits from the BaseModel class.
Line 4: We define a name field for the User class. The name field is a string with a minimum length of 3 characters and a maximum length of 20 characters.
Line 5: We define an age field for the User class. The age field is an integer with a minimum value of 18 and a maximum value of 68.
Line 6: We define an email field for the User class. The email field is a string.
Lines 8–12: We create a user_data dictionary containing the data for a new user.
Lines 14: We validate the user_data dictionary against the User model using the model.validate() method of the class and return a validated User object named user.
Lines 16–18: We print the data of the class object user.

When we run the code above, we see two errors:

The name string is of length 2, but according to the constraints, the length should be between 3–20.
The age integer has a value of 16, while the range defined for it is 18–68.

When we correct the values according to the constraints, the code runs perfectly:

Here’s an explanation of the code above:

Line 1: We import the BaseModel and constr classes from the Pydantic library.
Lines 3–4: We use the BaseModel class to create the Pydantic model class CheckEmail, and the constr class is used to define field validators that validate strings against pattern or regular expression.
Lines 6–8: We create a dictionary called user_data with a single key-value pair email.
Line 10: We call the CheckEmail.model_validate() method to validate the user_data dictionary.
Line 12: We print the email attribute of the user variable.

The code above will show an error because user_data doesn’t contain a valid email address. If we correct the email address, the code works fine:

from pydantic import BaseModel, Field, field_validator
class User(BaseModel):
    name: str = Field(unique=True)
    __values__ = {}
    
    def __init__(self, **data):
        super().__init__(**data)
        self.__values__[self.name] = self
    @field_validator("name")
    def validate_unique_name(cls, value, **kwargs):
        if value in cls.__values__:
            raise ValueError("Duplicate names are not allowed")
        return value
def check_for_duplicates(user_data):
    duplicates = []
    for name in user_data:
        try:
            User(name=name)
        except ValueError:
            duplicates.append(name)
    return duplicates
user_data = ["Tester1", "Tester1", "Tester2", "Tester2"]
duplicates = check_for_duplicates(user_data)
if duplicates:
    print("Duplicate names:")
    for name in duplicates:
        print(f"* {name}")
else:
    print("There are no duplicate names.")

Here’s an explanation of the code above:

Line 1: We import the BaseModel, Field, and field_validator classes from the Pydantic library.
Line 3: We define a Pydantic model called User.
Line 4: The User model has a single field name, which is defined as a str field with the unique keyword.
Line 6: The User model also has a __values__ attribute, which is a dictionary that stores all of the existing User instances.
Lines 8–10: The __init__() method of the User model adds the new User instance to the __values__ dictionary.
Lines 12–16: The validate_unique_name() field validator checks if the name field value is already present in the __values__ dictionary. If it is, the field validator raises a ValueError exception.
Lines 18–25: check_for_duplicates() checks for duplicate names in a list of names. It does this by trying to create a new User instance for each name in the list. If the User() constructor raises a ValueError exception, then the name is already present in the __values__ dictionary and is therefore a duplicate.
Line 27: We create a list of names called user_data.
Line 29: We call the check_for_duplicates() function to check for duplicate names in the list.
Lines 30–35: We print a list of duplicate names, if any.

When we run the above code, it displays the duplicates in the list. Now, when we remove the duplicates and run the code again, it shows that there are no duplicates in the list:

from pydantic import BaseModel, Field, field_validator
class User(BaseModel):
    name: str = Field(unique=True)
    __values__ = {}
    def __init__(self, **data):
        super().__init__(**data)
        self.__values__[self.name] = self
    @field_validator("name")
    def validate_unique_name(cls, value, **kwargs):
        if value in cls.__values__:
            raise ValueError("Duplicate names are not allowed")
        return value
def check_for_duplicates(user_data):
    duplicates = []
    for name in user_data:
        try:
            User(name=name)
        except ValueError:
            duplicates.append(name)
    return duplicates
user_data = ["Tester1", "Tester2"]
duplicates = check_for_duplicates(user_data)
if duplicates:
    print("Duplicate names:")
    for name in duplicates:
        print(f"* {name}")
else:
    print("There are no duplicate names.")

Using uniqueness checking in Pydantic is a great way to ensure that our data is consistent, accurate, and efficient.

Benefits of Pydantic

Using Pydantic for type checking and data validation has a number of benefits, including:

Pydantic helps in writing more robust and maintainable code by ensuring that data is consistent and correct.
Pydantic can catch data validation errors early on before they cause problems in the application.
Pydantic makes it easy to define and validate data structures, which can save time and effort.

Limitations of Pydantic

Pydantic isn’t included in the Python standard library, so it requires a separate installation.
Pydantic can be more complex to use for simple programs compared to other command-line parsing libraries.
Pydantic can be slower compared to other command-line parsing libraries because it does extra validation and processing of the data.

Pydantic is a powerful and intuitive library for type checking and data validation in Python. It’s easy to use and can provide significant benefits for our code quality, error reduction, and productivity.