What are normalization forms in a database?

Normalization is a design principle in a database to organize structured data. Normalization aims to eliminate data redundancy and anomalies to maintain data integrity.

Why normalization?

A database is known as a relation. A large database causes data redundancy which makes it difficult to maintain and update records and causes a waste of resources. A large relation also introduces insertion, updatation, and deletion anomalies.

Let's consider an employees table having the schema (employee_id, name, work_code, work, country_code, country) as shown in the table below.

Initial relation

EMPLOYEE_ID

NAME

WORK_CODE

WORK

COUNTRY

_CODE


COUNTRY

E001



Alia


S-01

Scientist

12

Canada

E001



Alia

T-34

Teacher

12

Canada

E002



Janice

T-34

Teacher

37

India

E002



Janice

A-73

Artist

37

India

E003


Alia

S-01

Scientist

37

India

Now we can understand the data redundancies and anomalies in the table above.

  • Data redundancy: The table contains redundant data for employees with multiple work codes. For example, both rows for employee E001 (Alia) have the same employee name and country code but different work codes and work. Similarly, employee E002 (Janice) appears twice with different work codes and work but the same employee name and country code. Storing this redundant data increases storage space usage and creates the risk of inconsistent updates.

  • Insertion anomaly: Suppose a new employee is added to the database with a specific work code, but there is no work assigned yet. Then it would be necessary to insert a null value or placeholder for the work column, which will result in incomplete data.

  • Deletion anomaly: If an employee with a unique combination of attributes ( employee ID and work code) is deleted, all the information associated with that employee, such as the work and country code, would also be lost.

  • Update anomaly: If the country code for an employee changes, it would require updating multiple rows, which will lead to inconsistencies if the update is not performed uniformly.

To resolve these issues in our relation, normalization forms are used.

Types of normal forms

The normalization process has many forms, each with its own rules and requirements. With each higher normal form, the rules get more strict, as shown in the diagram below.

Normalization forms
Normalization forms

There are many normal forms. Here's a brief description of all normal forms.

Normal forms

Normalization Form

Description

1-NF

Atomic values

2-NF

Relation in 1-NF and no partial dependency

3-NF

Relation in 2-NF and no transitive dependency

BCNF

Relation in 3-NF, And, for any dependency A → B, A should be a super key


4-NF

 Relation in BCNF and has no multi-valued dependency

5-NF

Relation in 4NF and does not contain any join dependency (loseless joining)

In this Answer, we will discuss the three most widely used normalization forms:

  • First normal form (1NF)

  • Second normal form (2NF)

  • Third normal form (3NF)

Let's get a detailed overview of these normal forms with the employee relation (initial table).

First normal form (1-NF)

It is the most simple form of normalization, it's rules are:

  • Each attribute (column) in a table should hold atomic values.

  • There should be no repeating groups or arrays of values within a row.

To convert the table into 1NF, we need to ensure that each column contains atomic values and there are no repeating groups. The initial table is already in 1-NF, so no changes are made.

Second normal form (2-NF)

  • The table must satisfy 1NF.

  • Partial dependencies need to be eliminated i.e. all non-key attributes (columns) must be fully dependent on the entire primary key.

To achieve 2NF, we need to identify and remove any partial dependencies.

Employee_role

EMPLOYEE_ID

WORK_CODE

E001



S-01

E001



T-34

E002



T-34

E002



A-73

E003


S-01

Employee_states

EMPLOYEE_ID

NAME

COUNTRY

_CODE


COUNTRY

E001



Alia


12

Canada

E001



Alia

12

Canada

E002



Janice

37

India

E002



Janice

37

India

E003


Alia

37

India

Work_table

WORK_CODE

WORK

S-01

Scientist

T-34

Teacher

T-34

Teacher

A-73

Artist

S-01

Scientist

The new tables separate the employee-specific details from the employee details that can be associated with multiple work. To take this a step further, we will separate them again into a different table to make it 3NF.

Third normal form (3-NF)

To achieve 3NF, we identify and remove any transitive dependencies.

In the table, we can observe that the country of an employee is functionally dependent on both the employee ID (EMPLOYEE_ID) and the work code. The employee's country can be determined by the combination of these two attributes. However, the employee ID and the work code are not functionally dependent on each other. So, the transitive dependency lies between the employee ID and the country with respect to the work code.

The two tables, "employee_role" and "work_table", remain the same, but the "employee_states" table is to be split.

Employee_table

EMPLOYEE_ID

NAME

COUNTRY

_CODE


E001



Alia


12

E001



Alia

12

E002



Janice

37

E002



Janice

37

E003


Alia

37

States_table

COUNTRY

_CODE


COUNTRY

12

Canada

12

Canada

37

India

37

India

37

India

Now our database is in 3NF.

Note: BCNF is also a popular normalization form.

Conclusion

In conclusion, the process of normalization plays a crucial role in database design to eliminate data redundancies, maintain data integrity, and improve database efficiency. It is important to note that the level of normalization to be achieved depends on the specific requirements and trade-offs of the application.

Free Resources

Copyright ©2025 Educative, Inc. All rights reserved