How to convert an array of indices to one-hot encoded NumPy array

Overview

One-hot encoding is a very popular technique used in machine learning to convert categorical data, like red, blue, and green, into binary values of $0$ and $1$ for machine learning algorithms to use.

NumPy arrays, like any other array, can be indexed based on the indices of the elements. A high-level representation of a NumPy array being converted into a one-hot encoded 2-D array is as follows:

Method

One-hot encoding creates a 2-D array whose number of rows is equal to the size of the original array and number of columns is equal to the max element in the 1-D array added to $1$ . In the example above, the number of rows is $3$ (the number of elements in a 1-D array) and the number of columns is $5$ (max element added to $1$ or $4 + 1$ ). In each row, the binary number $1$ is stored against the number in the original array, now treated as an index. For example, in the one-hot encoded array above, $1$ is stored on the 1^st index in row $1$ for the number $1$ as well as on the 4^th index in row $2$ for the number $4$ .

Let's look at the step-by-step transformation of another simple example below:

Explanation

Line 1: We import NumPy to use functions from this library.
Line 4: We declare a simple array of numbers.
Line 7: We initialize a 2-D array of $0$ 's using the numpy.zeros function, which takes the shape (rows, columns) as its first argument and the data type as its second argument. As mentioned earlier, the rows are equal to the length of the original array and the columns are equal to the value of the max element added to 1. In the code above, both the rows and columns are equal to 3. The data type of the array is specified as int.
Line 10: We use the numpy.arange function to create a range of integers using the size of the original array.

In the example above, numpy.arange will return [0 1 2]. This will be used to loop over the rows of the 2-D array. Each number in the original array is used as an index to add $1$ to the 2-D array. Notice that in row 1, $1$ is added for the number $0$ on the 0^th index and so on.

How to convert an array of indices to one-hot encoded NumPy array

Overview

Method

Code

Explanation