“Generative AI” refers to artificial intelligence that can be used to create new content, such as words, images, music, code, or video. Generative AI applications such as ChatGPT, a chatbot that answers questions with detailed written responses; and DALL-E, which creates realistic images and art based on text prompts; became widely popular beginning in 2022 when companies released versions of their applications that members of the public, not just experts, could easily use.
Generative AI systems are powerful because they are trained on extremely large datasets, which could potentially take advantage of nearly all the information on the internet. Today's generative AI models produce content that often is indistinguishable from that created by humans.
Scientists and engineers have used several approaches to create generative AI applications. Prominent models include generative adversarial networks, or GANs; variational autoencoders, or VAEs; diffusion models; and transformer-based models.
GANs typically work with image data and make use of two components: a "generator" to create new content based on training data and a "discriminator" to decide whether the generated content is real or fake. The generator and the discriminator work as adversaries, much like a counterfeiter trying to fool an authenticator. If the discriminator decides that the output is fake, this provides feedback to the generator to improve the output. The two go through multiple iterations until the discriminator cannot determine if the output is real or fake.
Autoencoders were created to learn efficient data representation (the ways in which data are stored, processed, and transmitted) and are used for purposes such as data compression (modifying data to reduce its size) and noise reduction (removing noise from a data signals). They make use of an "encoder," which analyzes a dataset and reduces its complexity while still preserving essential features of the data, and a "decoder," which uses the reduced data representation to recreate something nearly identical to the original. To adapt this for generative applications, "variational autoencoders," or VAEs, intentionally introduce variations—like small mistakes—into the encoding step. When decoders attempt to recreate the original data, they encounter the variations and "accidentally" generate new content.
Diffusion models are used by applications such as DALL-E and Stable Diffusion. They are inspired by the random motion of atoms and molecules. Think of the way food coloring disperses when dripped into a glass of water. As the color spreads, we can map how the individual molecules move through space and model that movement in reverse to the original drop it started from. In generative AI applications, diffusion models are trained on images instead of drops of food coloring. Within those images, they allow the pixels to move digitally by applying the physical laws of diffusion, thereby destroying the original image and creating a field of blurry static. Then, the model analyzes that "diffusion" from clear image to static. As the model applies this technique to many images of a particular category—such as photos of turtles—it becomes an expert at tracing the movement of the blurry set of pixels backward to the original clear image. The model can then take an image of random static, move the pixels according to what it has learned, and generate a new image of a turtle.
Transformer-based models, like the Generative Pre-trained Transformer (GPT) model used by ChatGPT, focus on data sequence and context rather than individual data points on their own. In the case of language, for example, these models are trained on a large variety of text passages from books, as well as resources such as Wikipedia. They analyze where words tend to appear in sentences and how words are used in association with other words around them. This makes transformer-based models well adapted to predict what words might come up next in a sentence, to translate text, or to generate new text.
While these applications sometimes make glaring mistakes (sometimes referred to as hallucinations), they are being used for many purposes, such as product design, urban architecture, and health care.
However, developing generative AI models requires a lot of computing power, which can be expensive. A huge amount of data must be stored during training, and applications require significant processing power. This has resulted in larger companies, such as Google and Microsoft-supported Open AI, leading the way in application development.
The rise of generative AI also poses potential threats, including the spread of misinformation and the creation of deep fakes. As this technology becomes more sophisticated, ethicists warn that guidelines for its ethical use must be developed in parallel.