This Answer explains the mechanism behind the generator for our AI anime GAN model, which allows us to generate anime character images in real time! After understanding the concepts behind the generator and the discriminator, we will be ready to write our own code for generating AI anime character images. Without further ado, let's get started!
A generator is a component of GANs that is responsible for producing data that is close to the real samples. As the generator's realistic factor increases, our model can generate data that is virtually indistinguishable from reality. Therefore, our discriminator's task of telling real from generated data becomes harder.
This continuous exchange between both of these allows us to gradually make our GAN generate more realistic outputs using harder generators for discriminators to catch.
Note: GANs are generative adversarial networks which are a machine learning framework consisting of two neural networks.
The generator creates data
The discriminator evaluates it
Before we continue with the code of the generator for our AI anime model, we will be putting forth a few crucial concepts and their easy explanations for you to build your understanding upon.
Concept | Explanation |
Sequential model | A Sequential model is like a linear stack of layers in neural networks. It allows us to create a neural network in sequential order. |
Input dimensions | An input dimension specifies the size of the input vector. |
Padding | A padding of value "same" keeps the output size consistent to the input. |
Kernel initializer | A kernel initializer sets how the weights of the kernel are initialized. |
Dense layer | A Dense layer having 8 * 8 * 512 neurons is added, projecting a 100 dimensions input vector into a high-dimensional space. |
ReLU activation | ReLU activation is used to introduce non-linearity in our model. Non-linearity helps us identify complex patterns. |
Reshaping | The Reshape layer transforms the output of the dense layer into a 4D tensor with dimensions i.e. 8 * 8 * 512. This helps prepare our data for the next convolutional layers. |
Deconvolutional blocks | Conv2DTranspose layers are used to upsample data. As we increase the filters, we increase spatial dimensions to make the image clearer. |
Output layer | The Conv2D layer is the final output layer. It combines patterns learned in earlier layers to generate a complete picture. |
Tanh activation | The tanh activation function maps gives us values between -1 and 1. |
We're now ready to write our own generator code. The core concepts explained above constitute most of the part of the code. The code will be written using Tensorflow and Keras.
generator = Sequential(name='generator')generator.add(layers.Dense(8 * 8 * 512, input_dim=100))generator.add(layers.ReLU())generator.add(layers.Reshape((8, 8, 512)))generator.add(layers.Conv2DTranspose(256, (4, 4), strides=(2, 2), padding='same', kernel_initializer=keras.initializers.RandomNormal(mean=0.0, stddev=0.02)))generator.add(layers.ReLU())generator.add(layers.Conv2DTranspose(128, (4, 4), strides=(2, 2), padding='same', kernel_initializer=keras.initializers.RandomNormal(mean=0.0, stddev=0.02)))generator.add(layers.ReLU())generator.add(layers.Conv2DTranspose(64, (4, 4), strides=(2, 2), padding='same', kernel_initializer=keras.initializers.RandomNormal(mean=0.0, stddev=0.02)))generator.add(layers.ReLU())generator.add(layers.Conv2D(3, (4, 4), padding='same', activation='tanh'))
Let's go through the steps one by one.
We start by creating a Sequential model called "generator".
We add a Dense
layer that can output data of a shape 8 x 8 x 512, using an input vector of size 100.
Here, we add a ReLU
activation function to introduce non-linearity.
We reshape the output into a 3D tensor having the shape 8 x 8 x 512 using Reshape
layer.
We add a 2D transposed convolutional layer Conv2DTranspose
i.e. deconvolution with 256 filters. It allows us to increase the spatial dimensions by a factor of 2 in both dimensions due to (2, 2) strides . We set the padding to "same". The kernel weights are then initialized using a random normal distribution having a 0 mean and a standard deviation of 0.02.
We again apply a ReLU activation to the output of our model's previous layer.
Next, we add another 2D transposed convolutional layer with 128 filters and the same size and padding as before. Kernel weights are initialized too.
Again, we apply a ReLU
activation on the output.
We add a third 2D transposed convolutional layer with 64 filters along with the same settings.
Next, we apply yet another ReLU
activation to our output.
Finally, we add a regular 2D convolutional layer Conv2D
with 3 filters, each of size 4 x 4. Padding value is "same," and the activation function used is basically a hyperbolic tangent i.e. tanh. This layer aims to output an image with 3 color channels.
generator.summary()
We can now display a summary of the model's architecture using generator.summary()
, showing the layers, output shapes, parameters, etc.
We have successfully learned how to set up a generator for our AI anime GAN model. Conclusively, the generator starts with a low-dimensional noise input and slowly transforms it into a higher-dimensional image i.e., more realistic through deconvolutional layers. Each deconvolutional layer increases our generated image's spatial dimensions, producing better images. The final output, which is basically activated by "tanh", represents an AI-generated image that resembles real images if the generator is built well.
Note: The complete GAN code for generating new images using discriminators and generators is present here. After you've understood both of these Answers, you can continue with the rest of the code.
Take this fun challenge and revise your concepts through it!
The Conv2D
layer
introduces non-linearity to the model for being able to recognize complex patterns
The ReLU
activation
is a convolutional layer that is used to extract features from images
A generator
tries to create more authentic images that are difficult for the discriminator to declare fake.
Free Resources