Member-only story

Following GPT-3, OpenAI Released DALL·E and CLIP This Month.

6 min readJan 21, 2021

Not long ago, IIya Sutskever, the co-founder and chief scientist of OpenAI, wrote an article in The Batch Weekly-2020 Year-end Special, edited by Enda Wu, saying that “in 2021, language models will begin to understand the visual world”. At the beginning of this month, OpenAI announced two multi-modal artificial intelligence systems, DALL-E and CLIP, once again igniting the AI community.

DALL·E: A “GPT-3” with 12 billion parameters using a text-image data set, which can generate various images based on text;
CLIP: You can learn visual concepts effectively through natural language supervision. You only need to provide the name of the visual category to be recognized. Using CLIP, you can do any visual classification, similar to the “Zero-shot” function of GPT-2 and GPT-3 .

01.DALL·E

The name of DALL·E is derived from the composite word of the artist Salvador Dalí and Pixar’s “Robots” (WALL-E). The name itself is full of machine imagination and exploration of art. DALL-E is very similar to GPT-3. It is also a transformer language model that receives text and images as input, and outputs the final converted image in multiple forms. It can edit the properties of specific objects in the image, as you can see here. You can even control multiple objects and their…

Following GPT-3, OpenAI Released DALL·E and CLIP This Month.

01.DALL·E

Written by Jarvis+

No responses yet