As part of the data
Normalization typically involves scaling all given numerical data points to a range of 0 to 1.
Standardization typically involves applying the transformation
NOTE: In application, the terms normalization and standardization are used interchangeably.
To illustrate this, let's consider an example. Let's say we have a feature called "number of miles driven by a person in a car." The realistic range for the data values of this feature would be 0 to 100,000. Now, let's say we have another feature called "age of person." The values for this feature would realistically lie between 0 and 100.
It can be seen that not only are the ranges of the 2 features vastly different from each other, but they are also quite large. The presence of large data values in our dataset can cause the training process of our model to be hindered. Hence, normalization is needed to ensure that the data we feed to our model is within a small range and that each data point isn't too large. This helps in the following:
Making the training process much less complex.
Significantly decreasing the training time.
The code below demonstrates how normalization can be done for an array of values:
import numpy as npdata_arr = [1.3, 57.0, 129.45, 3.9, 7.11, 1010.3, 30.7, 14.6, 0.0, 113435.27]# converting the list to a numpy array:data_arr = np.array(data_arr)print("Data BEFORE normalization:")print(data_arr)# Normalization:for i in range(0 , data_arr.shape[0]):data_arr[i] = ( data_arr[i] - np.mean(data_arr) ) / (np.std(data_arr))print("\nData AFTER normalization:")print(data_arr)
As you can see in the code above (the highlighted line), we are subtracting each value from the mean and then dividing by the standard deviation to implement the formula mentioned above.
Free Resources