Nvidia AI research uses computer vision technique to train StyleGAN2 to create high-quality synthetic images using 1000 source images, rather than 100,000...
Nvidia AI research uses computer vision technique to train StyleGAN2 to create high-quality synthetic images using 1000 source images, rather than 100,000…
At NeurIPS, Nvidia AI presented a new method for creating high-quality synthetic images using a generative adversarial network (GAN) trained on 1500 source images. The neural network, StyleGAN2, usually requires a training data set of tens or hundreds of thousands of images to produce high-quality synthetic pictures.
Nvidia’s AI research used a dataset of 1500 images of faces from the Metropolitan Museum of Art to create new images that emulate artworks in the Museum’s collection. While this breakthrough could be used to recreate the style of rare works and create new art inspired by historical portraits, there are wider implications for medical imaging AI.
A key problem facing medical AI models is the lack of available training data due to privacy concerns, but especially for rare diseases where 100,000 images of a certain type of illness might not even exist. Nvidia’s technique could be used to generate synthetic medical scan images of rare diseases based on the few real images that exist. This synthetic data could then be used as a dataset for training AI models to detect that disease.
Outside of the medical field, the same method can be used for any problem where huge quantities of AI training data in the form of images are difficult to obtain, or don’t exist. Nvidia hopes this new technique will accelerate AI research in areas where training data is scarce.
Generator vs discriminator
Invented in 2014, generative adversarial networks (GANs) have become the number one way to create synthetic images using AI. As the name suggests, the general concept is based on two AI agents (two separate neural networks) which work against each other. The generator network produces synthetic images based on the training data, and the discriminator looks at the images the generator produces and guesses whether they are real or fake. This guess is fed back into the generator in a feedback loop, until the discriminator can no longer tell the difference between an image from the training dataset and a synthetic image produced by the generator. This technique is behind problematic trends such as AI deepfakes, but the scope of applications is potentially huge.
The trouble with GANs is they require huge datasets of training images to produce high-quality results — often in the range of 50,000 to 100,000 images. No problem if you want to create new pictures of pets; the internet is famously not short of pictures of cats. But in applications where source images are limited, GANs can’t be used. If there are too few images in the training dataset, the discriminator can effectively memorize the entire dataset. This prevents it being able to provide useful feedback to the generator, since it rejects all synthetic images as fake (this is called “overfitting”).
Nvidia’s technique to reduce the amount of training data required actually multiplies the amount of data available for training by creating an array of “new” data from each existing image. The idea is based on a method previously used in computer vision, where random transformations or distortions are applied to source images or sections of those images. This rapidly produces a large dataset of flipped, rotated, colored or skewed versions of the original images which can be used to train the GAN.
“You can add noise to them, you can cut out blocks of them, you can slide the image around, you can rotate it, you can mess with the color, you can flip it around like a photographic negative, and so on,” said David Luebke, vice president of graphics research at Nvidia. “You can think of this as a collection of things that you could easily do in photoshop. Now you’ve got a wealth of imagery, you can set it up so that your GAN trains not just on a small set of training data, but on this effectively infinite set of training images that you can create from that small set.”
The amount of distortion applied to the images is one of the critical factors affecting the quality of the outcome. Not enough distortion, and the GAN succumbs to overfitting. Too much distortion, and the distortions start creeping into the synthesized images. Nvidia’s researchers developed a technique they call adaptive discriminator augmentation (ADA), which optimizes the amount of distortion introduced into the data to avoid overfitting and produce high quality synthetic images.
“We need to apply augmentation adaptively; it turns out we need to change how much augmentation we apply adaptively over the course of a training regimen,” Luebke said. “That’s why we call it adaptive discriminator augmentation. We’re adaptively choosing how much distortion we’re doing, and the probability of a distortion in the images that we’re showing to the discriminator.”
Overall, Luebke hopes this new technique will help to build AI models that detect rare medical conditions, where hundreds of thousands of images don’t exist.
“In situations like this, we think there’s great potential for this technique to improve things,” he said. “The upshot is that we can train high quality GANs on an order of magnitude or less data and still get very good results.”