What is normalization/standardization in AI/ML?

As part of the data preprocessingCleaning the data and getting it ready for use in Machine Learning models phase, it is desirable to normalize or standardize the dataset to get all the data points onto the same scale.

Normalization typically involves scaling all given numerical data points to a range of 0 to 1.

Standardization typically involves applying the transformation

x  μ σ\frac{x\ -\ \mu\ }{\sigma} to all the data points, where xx is the data point, μ\mu is the mean of all data points, and σ\sigma is the standard deviation of all the data points.

NOTE: In application, the terms normalization and standardization are used interchangeably.

Why is normalization needed?

To illustrate this, let's consider an example. Let's say we have a feature called "number of miles driven by a person in a car." The realistic range for the data values of this feature would be 0 to 100,000. Now, let's say we have another feature called "age of person." The values for this feature would realistically lie between 0 and 100.

It can be seen that not only are the ranges of the 2 features vastly different from each other, but they are also quite large. The presence of large data values in our dataset can cause the training process of our model to be hindered. Hence, normalization is needed to ensure that the data we feed to our model is within a small range and that each data point isn't too large. This helps in the following:

  • Making the training process much less complex.

  • Significantly decreasing the training time.

Code

The code below demonstrates how normalization can be done for an array of values:

import numpy as np
data_arr = [1.3, 57.0, 129.45, 3.9, 7.11, 1010.3, 30.7, 14.6, 0.0, 113435.27]
# converting the list to a numpy array:
data_arr = np.array(data_arr)
print("Data BEFORE normalization:")
print(data_arr)
# Normalization:
for i in range(0 , data_arr.shape[0]):
data_arr[i] = ( data_arr[i] - np.mean(data_arr) ) / (np.std(data_arr))
print("\nData AFTER normalization:")
print(data_arr)

Explanation

As you can see in the code above (the highlighted line), we are subtracting each value from the mean and then dividing by the standard deviation to implement the formula mentioned above.

Free Resources

Copyright ©2025 Educative, Inc. All rights reserved