X
Innovation

OpenAI unveils text-to-video model and the results are astonishing. Take a look for yourself

Hollywood filmmakers, you may want to watch out for Sora.
Written by Sabrina Ortiz, Editor
OpenAI video gnerator still frame

Still frame from a video generated by Sora. OpenAI's prompt was, "The camera directly faces colorful buildings in burano italy. An adorable dalmation looks through a window on a building on the ground floor. Many people are walking and cycling along the canal streets in front of the buildings."

OpenAI

Open AI already has market-leading AI models in image and text generation with DALL-E 3 and ChatGPT, respectively. Now, the company is coming for the text-to-video generation space, too, with a brand-new model. 

Also: The best AI image generators of 2024: Tested and reviewed

On Thursday, OpenAI unveiled Sora, its text-to-video model that can generate videos up to a minute long with impressive quality and detail, as seen in the demo video below:

Sora can tackle complex scenes, including multiple characters, specific types of motion, and great detail, because of the model's deep understanding of language, prompts, and how the subjects exist in the world, according to OpenAI. 

From watching different demo videos, you can see that OpenAI has managed to tackle two big issues in the video-generating space: continuity and longevity:

AI-generated videos are often choppy and distorted, making it clear to the audience where every frame ends and begins. For example, Runaway AI released its most advanced text-to-video model, Gen-2, in March. As seen below, the clips don't quite compare to those of OpenAI's model today:

OpenAI's model, on the other hand, can generate fluid video, making each generated clip look like it was lifted from a Hollywood-produced film. 

Also: How to use ChatGPT

OpenAI says Sora is a diffusion model that's able to produce high-quality output by using a transformer architecture similar to the GPT models, as well as past research from DALL-E and GPT models. In addition to generating video from text, Sora can generate video from a still image or fill in missing frames from videos:

Despite showing all of its advancements, OpenAI also addresses the model's weaknesses, claiming it can sometimes struggle with "simulating the physics of a complex scene, and may not understand specific instances of cause and effect." The model could also confuse the spatial details of a prompt.

The model is becoming available to red teamers first to asses the model's risks, and to a select number of creatives, such as visual artists, designers, and filmmakers, to collect feedback on how to improve the model to meet their needs. 

Also: I tried Microsoft Copilot's new AI image-generating feature, and it solves a real problem

It seems like we are entering a new era in which companies will shift focus to researching, developing, and launching capable AI text-to-video generators. Just two weeks ago, Google Research published a research paper on Lumiere, a text-to-video diffusion model that can also create highly realistic video. 

Editorial standards