What is Neural Architecture Search (NAS)?

Overview

As a beginner to deep learning, we often question how the experts know which layer to put at a spot and the required number of parameters for the neural network. Beginners sometimes try to combine different operations and waste a lot of time. Experts have designed all the architectures of the famous neural networks. It can be a cumbersome task to develop a complex architecture that can give a state-of-the-art performance.

Neural Architecture Search (NAS) is an automatic search method for the optimal neural network architecture. We can categorize it as:

  • Search space
  • Search strategy
  • Performance estimation strategy

We'll combine these methods to find the optimal architecture.

The working of NAS

Search space

Search space is the space in which we'll restrict the architectures that we want to search. Although this space is unbounded, it should be defined explicitly because we would otherwise be searching in an infinite space. This search space is parameterized by the following:

  • The maximum number of layers nn.
  • The types of operations (e.g., convolution, pooling, etc.).
  • The hyperparameters that can be considered for each operation. Depending upon the operation, the number of neurons, kernel size, number of kernels, strides, padding, and others are the hyperparameters.

Note: Search space changes based upon the operation in use. This makes it a conditional space.

Restrict the search space to exhaust all possible combinations of these layers. Following are the types of networks that are formed based on how we'll combine the layers.

  • Chain-structured networks
  • Residual networks
  • Dense networks

Chain-structured networks

In chain-structured networks, we simply stack layers. As shown in the figure below, the ithith layer takes input from the iāˆ’1i-1 layer.

The chain-structured network

Residual networks

Residual networks or ResNets have residual blocks that are like shortcut blocks. We'll add the output of a previous layer to the layer ahead. The connection of layers in a residual network is illustrated in the figure below.

The residual network

Dense networks

The dense networks or more commonly known as DenseNets, are just like residual networks. Instead of adding the output from a previous layer, DenseNets concatenates them.

The dense network

Search strategy

We'll define a search strategy to explore the search space. Our goal is to quickly find the optimal architecture, and normal searching techniques may take a lot of time. Some of the widely used techniques are random search, gradient-based searching methods, Bayesian optimization, and others.

Performance estimation strategy

When finding the optimal architecture, we need some way to measure the difference between their performance. The most simple approach would be to just train and test these models. But this would take a lot of time as a lot of computation is required for this. It may require thousands of GPUs, and it will still take days to complete.

There're a number of ways that have been employed by researchers, but the most intuitive is to extrapolate the learning curves of the models in their initial stage of training.

Note: To learn more about NAS and related practices, please refer to the research article by Elsken et al., "Neural architecture search: A survey."

Free Resources

Copyright ©2025 Educative, Inc. All rights reserved