OpenAI Debuts Text-To-Video Generator, Kicking AI Creativity Into Motion
By Mikelle Leow, 16 Feb 2024
Video screenshot via OpenAI
After conquering art with DALL-E and language with ChatGPT, OpenAI is taking on animation and footage with its latest artificially intelligent brainchild, Sora. Borrowing its name from the Japanese word for sky, Sora translates text prompts into videos that last up to 60 seconds.
The model is capable of animating multiple characters, executing specific movements, and painting scenes with accurate background details. With strong linguistic comprehension, it can supposedly interpret instructions and transform them into rich and smooth scenes. Sora’s foundation is a diffusion model that begins with static noise and refines it into a coherent video, removing noise step by step.
“We’re teaching AI to understand and simulate the physical world in motion, with the goal of training models that help people solve problems that require real-world interaction,” explains OpenAI.
Prompt: “A stylish woman walks down a Tokyo street filled with warm glowing neon and animated city signage. She wears a black leather jacket, a long red dress, and black boots, and carries a black purse. She wears sunglasses and red lipstick. She walks confidently and casually. The street is damp and reflective, creating a mirror effect of the colorful lights. Many pedestrians walk about.”
Moreover, the tool employs the recaptioning technique from DALL-E 3 to produce highly descriptive captions for visual training data, enhancing its ability to adhere to text descriptions in video generation. This AI can breathe life into still images, elongate existing videos, or insert missing frames with impressive accuracy.
Prompt: “A gorgeously rendered papercraft world of a coral reef, rife with colorful fish and sea creatures.”
Prompt: “A close up view of a glass sphere that has a zen garden within it. There is a small dwarf in the sphere who is raking the zen garden and creating patterns in the sand.”
For now, Sora, or the sky, is not yet the limit. The organization acknowledges that the tool sometimes fumbles with complicated scenarios or nuances of cause and effect. For example, it might show someone biting into a cookie, but then the cookie looks untouched in the next frame.
The app can also get muddled about the difference between left and right, or how to unfold an event over time.
Still, OpenAI views Sora as a crucial step towards achieving artificial general intelligence (AGI), acknowledging that the model is still evolving.
Prompt: “New York City submerged like Atlantis. Fish, whales, sea turtles and sharks swim through the streets of New York.”
Prompt: “A cat waking up its sleeping owner demanding breakfast. The owner tries to ignore the cat, but the cat tries new tactics and finally the owner pulls out a secret stash of treats from under the pillow to hold the cat off a little longer.”
Prompt: “Reflections in the window of a train traveling through the Tokyo suburbs.”
And while anyone is free to give ChatGPT and DALL-E 3 a whirl, Sora remains behind closed doors as OpenAI irons out potential risks or harms, like disinformation. At present, it is working with cybersecurity professionals as well as collaborating with visual artists, designers, and filmmakers to develop a more polished model.
Prompt: “Animated scene features a close-up of a short fluffy monster kneeling beside a melting red candle. The art style is 3D and realistic, with a focus on lighting and texture. The mood of the painting is one of wonder and curiosity, as the monster gazes at the flame with wide eyes and open mouth. Its pose and expression convey a sense of innocence and playfulness, as if it is exploring the world around it for the first time. The use of warm colors and dramatic lighting further enhances the cozy atmosphere of the image.”
Prompt: “Archeologists discover a generic plastic chair in the desert, excavating and dusting it with great care.”
[via The Guardian, The New York Times, CNN, videos and cover screenshot via OpenAI]