One-hot encoding is a very popular technique used in machine learning to convert categorical data, like red, blue, and green, into binary values of
NumPy arrays, like any other array, can be indexed based on the indices of the elements. A high-level representation of a NumPy array being converted into a one-hot encoded 2-D array is as follows:
One-hot encoding creates a 2-D array whose number of rows is equal to the size of the original array and number of columns is equal to the max element in the 1-D array added to
Let's look at the step-by-step transformation of another simple example below:
Now let's see the method in action in Python using NumPy. NumPy provides certain functions that make this process of transformation very efficient. Have a look at the code below and change the values to see how the conversion changes as a result.
import numpy as np#creating an arraysimple_array = np.array([0,2,1])#creating a 2D array filled with 0'sencoded_array = np.zeros((simple_array.size, simple_array.max()+1), dtype=int)#replacing 0 with a 1 at the index of the original arrayencoded_array[np.arange(simple_array.size),simple_array] = 1print(encoded_array)
NumPy
to use functions from this library.numpy.zeros
function, which takes the shape (rows, columns) as its first argument and the data type as its second argument. As mentioned earlier, the rows are equal to the length of the original array and the columns are equal to the value of the max element added to 1. In the code above, both the rows and columns are equal to 3. The data type of the array is specified as int
. numpy.arange
function to create a range of integers using the size of the original array. In the example above, numpy.arange
will return [0 1 2]
. This will be used to loop over the rows of the 2-D array. Each number in the original array is used as an index to add
Free Resources