What The Heck Are VAE-GANs

Yep, you read the title correctly. While a few friends of mine are vegans, none of them knew anything about VAE-GANs. VAE-GAN stands for Variational Autoencoder- Generative Adversarial Network (that is one heck of a name.) Before we get started, I must confess that I am no expert in this subject matter (I don’t have PhD in electrical engineering, just sayin’). But after reading several research papers and watching Ian Goodfellow’s 30-minute long intro to GANs, here is a short (yet concise) summary of my major takeaways:

Image reconstructed by VAE and VAE-GAN compared to their original input images

Variational Autoencoders (VAEs)

The simplest way of explaining variational autoencoders is through a diagram. Alternatively, you can read

Irhum Shafkat

’s excellent article on Intuitively Understanding Variational Autoencoders. At this point I assume you have a general idea of what unsupervised learning and generative models are. The textbook definition of a VAE is that it “provides probabilistic descriptions of observations in latent spaces.” In plain English, this means VAEs store latent attributes as probability distributions.

“Variational autoencoders”- Jeremy Jordan

Each input image has features that can normally be described as single, discrete values. Variational autoencoders describe these values as probability distributions. Decoders can then sample randomly from the probability distributions for input vectors. Let me guess, you’re probably wondering what a decoder is, right? Let’s take a step back and look at the general architecture of VAE.

Typical setup of a variational autoencoder is nothing other than a cleverly designed deep neural network, which consists of a pair of networks: the encoder and the decoder.Theencodercan be better described as a variational inference network, which is responsible for the mapping of inputxto posteriors distributionsqθ(z∣x).Likelihood p(x∣z) is then parametrized by the decoder, a generative network which takes latent variables z and parameters as inputs and projects them to data distributions pϕ(x∣z).

A major drawback of VAEs is the blurry outputs that they generate. As suggested by Dosovitskiy & Brox, VAE models tend to produce unrealistic, blurry samples. This has to do with how data distributions are recovered and loss functions are calculated in VAEs in which we will discuss further below. A 2017 paper by Zhao et. al. has suggested modifications to VAEs to not use variational Bayes method to improve output quality.

Generative Adversarial Networks (GANs)

The dictionary definition of adversarial is involving or characterized by conflict or opposition. This is in my opinion a very accurate description of what GANs are. Just like VAEs, GANs belong to a class of generative algorithms that are used in unsupervised machine learning. Typical GANs consist of two neural networks, a generative neural network and a discriminative neural network. A generative neural network is responsible for taking noise as input and generating samples. The discriminative neural network is then asked to evaluate and distinguish the generated samples from training data. Much like VAEs, generative networks map latent variables and parameters to data distributions.

The major goal of generators is to generate data that increasingly “fools” the discriminative neural network, i.e. increasing its error rate. This can be done by repeatedly generating samples that appear to be from the training data distribution. A simple way to visualize this is the “competition” between a cop and a cyber criminal. The cyber criminal (generator) attempts to create online identities that resemble ordinary citizens, while the cop (discriminator) tries to distinguish fake profiles from the real ones.

Variational Autoencoder Generative Adversarial Networks (VAE-GANs)

Okay. Now that we have introduced VAEs and GANs, it’s time to discuss what VAE-GANs really are. The term VAE-GAN is first introduced in the paper “Autoencoding beyond pixels using a learned similarity metric” by A. Larsen et. al. The authors suggested the combination of variational autoencoders and generative adversarial networks outperforms traditional VAEs.

VAE-GAN architecture, the discriminator from GAN takes input from VAE’s decoder

Remember GANs are subdivided into generators and discriminator networks? The authors suggested a GAN discriminator can be used in place of a VAE’s decoder to learn the loss function. The motivation behind this modification is as mentioned above, VAEs tend to produce blurry outputs during the reconstruction phase. This “blurriness” is somehow related to the way VAE’s loss function is calculated. I am not going into the nitty gritty of how this new loss function is calculated, but all you need to know is this set of equation

Learned Losses in VAE-GAN

Now that’s a lot of L’s. But jokes aside, the above equations assume the lth layer of the discriminator have outputs that differ in a Gaussian manner. As a result, calculating the mean squared error (MSE) between the lth layer outputs gives us the VAE’s loss function. The final output of GAN, D(x), can then be used to calculate its own loss function.

Generative models are now added to the list of AI research by top tech companies such as Facebook. Yann Lecun, a prominent computer scientist and AI visionary once said “This (Generative Adversarial Networks), and the variations that are now being proposed is the most interesting idea in the last 10 years in ML, in my opinion.”

Besides VAE-GANs, many other variations of GANs have been researched and implemented. DCGANs, or Deep Convolutional Generative Adversarial Networks, are introduced not long after Ian Goodfellow’s introduction to the original GANs. I am excited to see for generative model to find its role in future AI applications, and potentially improving the qualities of our lives.

Thanks for reading my article.

本站仅提供存储服务，所有内容均由用户发布，如发现有害或侵权内容，请点击举报。