How to convert an array of indices to one-hot encoded NumPy array

Overview

One-hot encoding is a very popular technique used in machine learning to convert categorical data, like red, blue, and green, into binary values of 00 and 11 for machine learning algorithms to use.  

NumPy arrays, like any other array, can be indexed based on the indices of the elements. A high-level representation of a NumPy array being converted into a one-hot encoded 2-D array is as follows:

How a simple array would look after being converted to a one-hot encoded array

Method

One-hot encoding creates a 2-D array whose number of rows is equal to the size of the original array and number of columns is equal to the max element in the 1-D array added to 11. In the example above, the number of rows is 33 (the number of elements in a 1-D array) and the number of columns is 55 (max element added to 11 or 4+14 + 1). In each row, the binary number 11 is stored against the number in the original array, now treated as an index. For example, in the one-hot encoded array above, 11 is stored on the 1st index in row 11 for the number 11 as well as on the 4th index in row 22 for the number 44

Let's look at the step-by-step transformation of another simple example below:

Conversion of array of indices to one-hot encoded array
1 of 4

Code

Now let's see the method in action in Python using NumPy. NumPy provides certain functions that make this process of transformation very efficient. Have a look at the code below and change the values to see how the conversion changes as a result.

import numpy as np
#creating an array
simple_array = np.array([0,2,1])
#creating a 2D array filled with 0's
encoded_array = np.zeros((simple_array.size, simple_array.max()+1), dtype=int)
#replacing 0 with a 1 at the index of the original array
encoded_array[np.arange(simple_array.size),simple_array] = 1
print(encoded_array)
Code example showing conversion of an array into a one-hot encoded array

Explanation

  • Line 1: We import NumPy to use functions from this library.
  • Line 4: We declare a simple array of numbers.
  • Line 7: We initialize a 2-D array of 00's using the numpy.zeros function, which takes the shape (rows, columns) as its first argument and the data type as its second argument. As mentioned earlier, the rows are equal to the length of the original array and the columns are equal to the value of the max element added to 1. In the code above, both the rows and columns are equal to 3. The data type of the array is specified as int.
  • Line 10: We use the numpy.arange function to create a range of integers using the size of the original array.

In the example above, numpy.arange will return [0 1 2]. This will be used to loop over the rows of the 2-D array. Each number in the original array is used as an index to add 11 to the 2-D array. Notice that in row 1, 11 is added for the number 00 on the 0th index and so on.

    Free Resources

    Copyright ©2025 Educative, Inc. All rights reserved