What is Neural Architecture Search (NAS)?

Overview

As a beginner to deep learning, we often question how the experts know which layer to put at a spot and the required number of parameters for the neural network. Beginners sometimes try to combine different operations and waste a lot of time. Experts have designed all the architectures of the famous neural networks. It can be a cumbersome task to develop a complex architecture that can give a state-of-the-art performance.

Neural Architecture Search (NAS) is an automatic search method for the optimal neural network architecture. We can categorize it as:

Search space
Search strategy
Performance estimation strategy

We'll combine these methods to find the optimal architecture.

Search space

Search space is the space in which we'll restrict the architectures that we want to search. Although this space is unbounded, it should be defined explicitly because we would otherwise be searching in an infinite space. This search space is parameterized by the following:

The maximum number of layers $n$ .
The types of operations (e.g., convolution, pooling, etc.).
The hyperparameters that can be considered for each operation. Depending upon the operation, the number of neurons, kernel size, number of kernels, strides, padding, and others are the hyperparameters.

Note: Search space changes based upon the operation in use. This makes it a conditional space.

Restrict the search space to exhaust all possible combinations of these layers. Following are the types of networks that are formed based on how we'll combine the layers.

Chain-structured networks
Residual networks
Dense networks

Chain-structured networks

In chain-structured networks, we simply stack layers. As shown in the figure below, the $ith$ layer takes input from the $i-1$ layer.

Search strategy

We'll define a search strategy to explore the search space. Our goal is to quickly find the optimal architecture, and normal searching techniques may take a lot of time. Some of the widely used techniques are random search, gradient-based searching methods, Bayesian optimization, and others.

Performance estimation strategy

When finding the optimal architecture, we need some way to measure the difference between their performance. The most simple approach would be to just train and test these models. But this would take a lot of time as a lot of computation is required for this. It may require thousands of GPUs, and it will still take days to complete.

There're a number of ways that have been employed by researchers, but the most intuitive is to extrapolate the learning curves of the models in their initial stage of training.

Note: To learn more about NAS and related practices, please refer to the research article by Elsken et al., "Neural architecture search: A survey."

Free AI Mock Interviews

Coding Interview

Coding PatternsFree Interview

Gain insights and practical experience with coding patterns through targeted MCQs and coding problems, designed to match and challenge your expertise level.

System Design

YouTubeFree Interview

Learn to design a video streaming platform like YouTube by tackling functional and non-functional requirements, core components, and high-level to detailed design challenges.

Free Resources