In zero-shot learning (ZSL), generative models can be used to bridge the gap between known and unknown classes by producing data representations or samples for unknown classes that were not observed during training.
Note: Check out this Answer on ZSL for a comprehensive insight into the concept.
Generative models belong to a category of machine learning models that have been specifically developed to create data that closely resembles the data they were trained on. These models are adept at understanding and capturing the structure and patterns within their training data. This enables them to generate new samples of data that bear a striking resemblance to the original training dataset.
There are many different types of generative models, but two of the most common are:
Variational autoencoders (VAEs)
Generative adversarial networks (GANs)
VAEs are a sort of autoencoder that uses probability to represent the data distribution. They are made up of an encoder network, which maps input data into a probabilistic latent space, and a decoder network, which creates data from samples in the latent space.
GANs are made up of two neural networks that are trained at the same time:
Generator: The generator generates data samples.
Discriminator: The discriminator determines how closely these samples resemble actual data.
The training procedure consists of a competitive game in which the generator attempts to produce increasingly realistic samples while the discriminator attempts to discriminate between actual and generated data.
Note: To gain additional insights into generative methods, check out this Answer on ZSL.
In ZSL, generative models can aid in the following ways:
Synthetic data generation: GANs can be used to generate synthetic examples of unknown classes.
For example, if we have a GAN trained on a dataset of bird photographs and wish to do ZSL for a new, unknown bird class, we can use the GAN to produce synthetic images of that bird, which we can then use for classification.
Space embedding: VAEs and other generative models can be used to generate continuous latent spaces in which data points are embedded. This embedding can aid in the mapping of unknown classes to the latent space, enabling zero-shot detection.
Transfer learning: Generative models can be integrated into an extensive ZSL pipeline that includes transfer learning. Pretrained generative models can be fine-tuned using a smaller dataset that includes instances of both known and unknown classes, allowing them to adapt to the particular detection job.
Class-conditional generation: In ZSL, we often have access to some information regarding unknown classes, such as textual descriptions, semantic embeddings, or attributes. Generative models can be trained with this knowledge to produce data for unknown classes.
For instance, we can train a GAN using class attributes to produce images that match with classes that haven’t been seen before.
Data augmentation: We can leverage generative models to enhance our training data for known classes, resulting in a greater variation of our training samples. As the model learns a more robust feature representation, it can increase its generalization to unknown classes.
Domain adaptation: Generative models can also be employed for domain adaptation, which is the process of adapting a model trained on one domain (e.g., one dataset) to perform well on another. Generative models can aid in the generation of synthetic data that’s similar to the unknown domain, making the adaptation process more successful.
Fine-tuning with generated data: We can utilize generated data to fine-tune a pretrained model for unknown classes. The capacity of the generative model to generate samples that resemble unknown classes helps lead this fine-tuning process.
The following code creates a VAE with TensorFlow, trains it on random data, and then generates synthetic data samples using the trained VAE:
import osos.environ['TF_CPP_MIN_LOG_LEVEL'] = '2'import numpy as secret_numpyimport matplotlib.pyplot as secret_plotimport tensorflow as secret_tf# Variational Autoencoder (VAE) architectureclass SecretAutoencoder:def __init__(self, input_size, latent_size):self.input_size = input_sizeself.latent_size = latent_size# Encoder networkself.encoder = secret_tf.keras.Sequential([secret_tf.keras.layers.Input(shape=(input_size,)),secret_tf.keras.layers.Dense(256, activation='relu'),secret_tf.keras.layers.Dense(2 * latent_size)])# Decoder networkself.decoder = secret_tf.keras.Sequential([secret_tf.keras.layers.Input(shape=(latent_size,)),secret_tf.keras.layers.Dense(256, activation='relu'),secret_tf.keras.layers.Dense(input_size, activation='sigmoid')])def reparameterize(self, mean, logvar):epsilon = secret_tf.random.normal(shape=secret_tf.shape(mean))return mean + secret_tf.exp(0.5 * logvar) * epsilondef encode(self, x):enc_output = self.encoder(x)mean, logvar = secret_tf.split(enc_output, num_or_size_splits=2, axis=1)z = self.reparameterize(mean, logvar)return z, mean, logvardef decode(self, z):return self.decoder(z)def compute_kl_divergence(self, mean, logvar):return -0.5 * secret_tf.reduce_sum(1 + logvar - secret_tf.square(mean) - secret_tf.exp(logvar), axis=1)def train(self, x, learning_rate=0.001, num_epochs=100):optimizer = secret_tf.keras.optimizers.Adam(learning_rate)loss_history = [] # Store loss historyfor epoch in range(num_epochs):with secret_tf.GradientTape() as tape:z, mean, logvar = self.encode(x)x_reconstructed = self.decode(z)kl_divergence = self.compute_kl_divergence(mean, logvar)reconstruction_loss = secret_tf.reduce_mean(secret_tf.square(x - x_reconstructed))loss = reconstruction_loss + secret_tf.reduce_mean(kl_divergence)gradients = tape.gradient(loss, self.encoder.trainable_variables + self.decoder.trainable_variables)optimizer.apply_gradients(zip(gradients, self.encoder.trainable_variables + self.decoder.trainable_variables))loss_history.append(loss) # Store the current lossif (epoch + 1) % 10 == 0:print(f"Epoch [{epoch + 1}/{num_epochs}], Loss: {loss:.4f}")# Loss history displaysecret_plot.figure()secret_plot.plot(loss_history)secret_plot.xlabel('Epoch')secret_plot.ylabel('Loss')secret_plot.title('Training Loss')secret_plot.savefig("output/secret_autoencoder_loss.png", dpi=300)def generate(self, num_samples=1):z = secret_tf.random.normal(shape=(num_samples, self.latent_size))return self.decode(z)if __name__ == "__main__":# Random data generationsecret_data = secret_numpy.random.rand(100, 32)input_size = secret_data.shape[1]latent_size = 16secret_vae = SecretAutoencoder(input_size, latent_size)num_epochs = 50learning_rate = 0.001secret_data = secret_tf.convert_to_tensor(secret_data, dtype=secret_tf.float32)secret_vae.train(secret_data, learning_rate=learning_rate, num_epochs=num_epochs)# Samples generation from the VAEsecret_generated_samples = secret_vae.generate(num_samples=5)for i, sample in enumerate(secret_generated_samples):print(f"Generated Sample {i + 1}: {sample}")
Here’s the explanation of the above code:
Lines 15–19: The encoder network is defined in the VAE class’s constructor. It accepts a shape input (input_size,)
, applies two dense layers with relu
activation, and returns the mean and log variance of the latent space.
Lines 22–26: The decoder network is also defined in the constructor. It accepts a latent space input of shape (latent_size,)
, applies two dense layers with relu
activation, and creates an output that should resemble the input data (with a sigmoid activation function assuming the data is in the range [0, 1]
).
Lines 28–30: The reparameterize
function is used to sample a latent vector based on the encoder network’s mean and log variance.
Lines 32–36: The encode
method accepts an input x
, passes it through the encoder, and returns the latent vector z
, mean, and log variance.
Lines 38–39: The decode
function accepts a latent vector z
and returns the reconstructed data.
Lines 41–42: The compute_kl_divergence
method computes the
Lines 44–45: The train
method trains the VAE. It computes the reconstruction loss and KL divergence and optimizes the overall loss using the Adam
optimizer.
Lines 49–55: To train the VAE, the loop iterates across a specified number of epochs (num_epochs
).
Lines 57–58: Gradients of the total loss with respect to the trainable variables of both the encoder and decoder are generated within the GradientTape
context. During optimization, these gradients will be utilized to update the model’s weights. The computed gradients are used to update the model’s trainable variables (weights and biases) using the specified optimizer (in this case, the Adam
optimizer).
Lines 66–71: These lines display loss history.
Lines 73–75: The generate
method generates new data samples by sampling from the latent space.
Lines 78–94: To train the VAE, random data is created. The VAE is instantiated and trained on this random data and then used to produce new samples.
Expected output: The code displays the training loss history in the form of a graph, generates five synthetic data samples, and then displays those generated samples.
Generative models such as VAEs and GANs have demonstrated potential in ZSL tasks. These tasks involve recognizing or categorizing objects or concepts that the model hasn’t been trained on. However, these models also have a few drawbacks when used for ZSL:
When we attempt to extend the capabilities of models to accommodate a number of unfamiliar classes, we encounter certain challenges. As the number of classes expands, it becomes more difficult to generate meaningful instances for each class. Consequently, the efficiency of the model can be compromised.
Training generative models, especially GANs, can be computationally intensive and might require significant computational power. This can pose limitations in ZSL applications.
If the distribution of unseen classes is noticeably different from that of the data used for training, the generative models might encounter difficulties in producing representations for those unseen classes.
GANs in particular are vulnerable to mode collapse, which occurs when the generator concentrates on creating only a subset of samples while ignoring the variety of the target distribution. As a result, the generative model might be unable to appropriately represent unseen classes.
Generative models can create samples that are neither diversified nor representative of the unknown classes, resulting in poor generalization. In ZSL, it’s critical that produced examples include the whole range of variants inside unknown classes.
Free Resources