A normalizing flow refers to a function, or a chain of composite functions, that takes a given data distribution from the dataset. It processes it to produce a simple and comprehensive Gaussian/normal distribution.
The functions in these flows must be invertible. Invertible functions imply one-to-one mapping between the inputs and outputs. Therefore, for every input, there is only one deterministic output. Also, every output is a result of a single input transformation.
In the case of three composite functions, a normalizing flow is represented as follows:
Here,
We provide supervised learning mechanisms with labeled training data. From this data, the model trains to adapt its parameters to best model the distribution. In contrast, we don't need to always label the input data. Or, sometimes, we can't deem the machine learning problem be labeled as such.
For instance, when we generate a colored image from a grey-scale image and a contextual color palette, we can't label the input data. This is an example of an unsupervised learning problem that we need to resolved using techniques such as normalizing flows.
Let
If we consider a change in variables during the transformation, we can describe the probability of
The Jacobian matrix of
Let's note that
Generative models aim to decipher the underlying probability distribution that governs the training dataset. An example of a problem that requires generative modeling is a case that requires us to infer the next word in a speech sample, given the first few frames of the audio signal.
Different models are formulated for audio and image synthesis using flow-based models. Waveflow and WaveNode are examples of generative audio models that leverage this transformation.
In the context of image generation, NVP, a normalizing flow-based generative model for images, is used in synthesizing medical images.
The functions are one-to-one. Hence, there is no dimensional reduction, which makes the computation time-consuming.
They are not bound to take the density approximation measures in the dataset into account.
The training process is easier and more deterministic due to the presence of one-to-one functions. This eradicates the need to alter the model hyperparameters.
The model eliminates unnecessary feature representations and, hence, reduces noise.
The model thoroughly learns the exact distribution of the training dataset.
Free Resources