What is normalization/standardization in AI/ML?

As part of the data preprocessingCleaning the data and getting it ready for use in Machine Learning models phase, it is desirable to normalize or standardize the dataset to get all the data points onto the same scale.

Normalization typically involves scaling all given numerical data points to a range of 0 to 1.

Standardization typically involves applying the transformation

$\frac{x\ -\ \mu\ }{\sigma}$ to all the data points, where $x$ is the data point, $\mu$ is the mean of all data points, and $\sigma$ is the standard deviation of all the data points.

NOTE: In application, the terms normalization and standardization are used interchangeably.

Why is normalization needed?

To illustrate this, let's consider an example. Let's say we have a feature called "number of miles driven by a person in a car." The realistic range for the data values of this feature would be 0 to 100,000. Now, let's say we have another feature called "age of person." The values for this feature would realistically lie between 0 and 100.

It can be seen that not only are the ranges of the 2 features vastly different from each other, but they are also quite large. The presence of large data values in our dataset can cause the training process of our model to be hindered. Hence, normalization is needed to ensure that the data we feed to our model is within a small range and that each data point isn't too large. This helps in the following:

Making the training process much less complex.
Significantly decreasing the training time.

Code

The code below demonstrates how normalization can be done for an array of values:

Free AI Mock Interviews

Coding Interview

Coding PatternsFree Interview

Gain insights and practical experience with coding patterns through targeted MCQs and coding problems, designed to match and challenge your expertise level.

System Design

YouTubeFree Interview

Learn to design a video streaming platform like YouTube by tackling functional and non-functional requirements, core components, and high-level to detailed design challenges.

Free Resources

What is normalization/standardization in AI/ML?

Why is normalization needed?

Code

Explanation