What are normalization forms in a database?

Normalization is a design principle in a database to organize structured data. Normalization aims to eliminate data redundancy and anomalies to maintain data integrity.

Why normalization?

A database is known as a relation. A large database causes data redundancy which makes it difficult to maintain and update records and causes a waste of resources. A large relation also introduces insertion, updatation, and deletion anomalies.

Let's consider an employees table having the schema (employee_id, name, work_code, work, country_code, country) as shown in the table below.

Now we can understand the data redundancies and anomalies in the table above.

Data redundancy: The table contains redundant data for employees with multiple work codes. For example, both rows for employee E001 (Alia) have the same employee name and country code but different work codes and work. Similarly, employee E002 (Janice) appears twice with different work codes and work but the same employee name and country code. Storing this redundant data increases storage space usage and creates the risk of inconsistent updates.
Insertion anomaly: Suppose a new employee is added to the database with a specific work code, but there is no work assigned yet. Then it would be necessary to insert a null value or placeholder for the work column, which will result in incomplete data.
Deletion anomaly: If an employee with a unique combination of attributes ( employee ID and work code) is deleted, all the information associated with that employee, such as the work and country code, would also be lost.
Update anomaly: If the country code for an employee changes, it would require updating multiple rows, which will lead to inconsistencies if the update is not performed uniformly.

To resolve these issues in our relation, normalization forms are used.

Types of normal forms

The normalization process has many forms, each with its own rules and requirements. With each higher normal form, the rules get more strict, as shown in the diagram below.

In this Answer, we will discuss the three most widely used normalization forms:

First normal form (1NF)
Second normal form (2NF)
Third normal form (3NF)

Let's get a detailed overview of these normal forms with the employee relation (initial table).

First normal form (1-NF)

It is the most simple form of normalization, it's rules are:

Each attribute (column) in a table should hold atomic values.
There should be no repeating groups or arrays of values within a row.

To convert the table into 1NF, we need to ensure that each column contains atomic values and there are no repeating groups. The initial table is already in 1-NF, so no changes are made.

Second normal form (2-NF)

The table must satisfy 1NF.
Partial dependencies need to be eliminated i.e. all non-key attributes (columns) must be fully dependent on the entire primary key.

To achieve 2NF, we need to identify and remove any partial dependencies.

The new tables separate the employee-specific details from the employee details that can be associated with multiple work. To take this a step further, we will separate them again into a different table to make it 3NF.

Third normal form (3-NF)

To achieve 3NF, we identify and remove any transitive dependencies.

In the table, we can observe that the country of an employee is functionally dependent on both the employee ID (EMPLOYEE_ID) and the work code. The employee's country can be determined by the combination of these two attributes. However, the employee ID and the work code are not functionally dependent on each other. So, the transitive dependency lies between the employee ID and the country with respect to the work code.

The two tables, "employee_role" and "work_table", remain the same, but the "employee_states" table is to be split.

Free AI Mock Interviews

Coding Interview

Coding PatternsFree Interview

Gain insights and practical experience with coding patterns through targeted MCQs and coding problems, designed to match and challenge your expertise level.

System Design

YouTubeFree Interview

Learn to design a video streaming platform like YouTube by tackling functional and non-functional requirements, core components, and high-level to detailed design challenges.

Free Resources

EMPLOYEE_ID	NAME	WORK_CODE	WORK	COUNTRY _CODE	COUNTRY
E001	Alia	S-01	Scientist	12	Canada
E001	Alia	T-34	Teacher	12	Canada
E002	Janice	T-34	Teacher	37	India
E002	Janice	A-73	Artist	37	India
E003	Alia	S-01	Scientist	37	India

Normalization Form	Description
1-NF	Atomic values
2-NF	Relation in 1-NF and no partial dependency
3-NF	Relation in 2-NF and no transitive dependency
BCNF	Relation in 3-NF, And, for any dependency A → B, A should be a super key
4-NF	Relation in BCNF and has no multi-valued dependency
5-NF	Relation in 4NF and does not contain any join dependency (loseless joining)

EMPLOYEE_ID	WORK_CODE
E001	S-01
E001	T-34
E002	T-34
E002	A-73
E003	S-01

EMPLOYEE_ID	NAME	COUNTRY _CODE	COUNTRY
E001	Alia	12	Canada
E001	Alia	12	Canada
E002	Janice	37	India
E002	Janice	37	India
E003	Alia	37	India

WORK_CODE	WORK
S-01	Scientist
T-34	Teacher
T-34	Teacher
A-73	Artist
S-01	Scientist

What are normalization forms in a database?

Why normalization?

Initial relation

Types of normal forms

Normal forms

First normal form (1-NF)

Second normal form (2-NF)

Employee_role

Employee_states

Work_table

Third normal form (3-NF)

Employee_table

States_table

Conclusion