Last month, some filmmakers made an initial attempt with Sora, and OpenAI recently revealed their astonishing experimental results.
Compared with the demonstration video carefully selected by OpenAI six weeks ago to promote its latest generative model, these short films have shown a huge leap. Here's how three filmmakers managed to do so.
"Airhead"
Shy Kids is a popular band and film production team based in Toronto, Canada, with a style positioned as "punk rock Pixar." The group has previously tried video generation technology.In 2023, the company used an open-source tool called Stable Warpfusion to produce the music video for one of its songs. Although the end product was cool, it was low in resolution and full of flaws.
However, the short film titled "Air Head," produced using Sora, was almost indistinguishable from reality, except for the protagonist who had a balloon for a head.
Most video generation tools struggle to maintain consistency between frames, which is their biggest drawback. When OpenAI invited the Shy Kids team to try Sora, they were curious about how far they could go.
"We thought a fun and interesting experiment with Sora would be to see if we could create a character with consistency," said Walter Woodman, a member of Shy Kids. "We think it was basically successful."
Advertisement
Generative models have difficulty handling the details of human body parts such as hands and faces. However, in the video, there is a scene showing a train carriage full of passengers with almost perfect faces. Woodman said, "Those faces on the train were generated by Sora, and what it can do is astonishing."Does this mean that the facial and hand issues in AI-generated videos have been resolved? Not quite. We can still see distorted body parts in individual places.
The text in the video is also a problem. In another video by the creative agency Native Foreign, we see a sign of a bicycle repair shop misspelled as "Bicycle Repaich". But all the content in "Air Head" is Sora's original output.
After splicing together many different segments made by the tool, Shy Kids performed a series of post-processing to make the movie look better. For example, they used visual effects tools to fix some shots of the protagonist's balloon face.
Woodman also believes that music and voice-over help to improve the quality of the short film, so they created some themselves and added them to the video. Woodman said that integrating these artificially made elements with Sora's work makes the film vibrant.
"Without humans, this technology is nothing," he said. "It's a powerful tool, but you are its soul.""Abstract"
Artist and filmmaker Paul Trillo wanted to challenge Sora with the "feeling of cinema" as a topic. His video, using a vintage-style film lens, shows through several clips how a person covered in sequins transforms into a glittering ball and a breakdancing garbage man.
He said, everything you see is Sora's original output: "There is no color correction or post-production effects." Even the jump cuts in the first part of the short film were made with the model.
Trillo felt that the demonstration released by OpenAI last month was too much like a game segment. He said, "I wanted to see what other styles would bring about possibilities." The finished product is a short film that looks like it was shot on old-fashioned 16mm film.He said: "This requires a lot of experimentation, but I stumbled upon a series of prompts that help make the video style more organic or more cinematic."
"Beyond Our Reality"
Don Allen Stevenson III is a filmmaker and visual effects artist. A few years ago, he and several other artists were invited by OpenAI to try out its text-to-image model, DALL-E 2.
Stevenson's short film is a National Geographic-style nature documentary that introduces us to a zoo composed of imagined animals, including such strange species as giraffe flamingos, flying pigs, and eel cats.Stevenson said that in many ways, using a text-to-video model is like using a text-to-image model. He said: "You just need to input a text prompt and then keep adjusting it."
However, there is a tricky problem. When you try different prompts, Sora generates low-resolution videos. When you find a segment you like, you can choose to increase the resolution.
But going from low resolution to high resolution requires a new round of generation, which may lose the details you liked in the low-resolution version.
Stevenson said that sometimes the perspective may change, or objects in the frame may move. At the same time, like other generative models, Sora also has hallucination issues.
For images, this may mean strange visual flaws. For videos, these defects may appear over time, such as strange jumps between frames.
Stevenson also had to input a language that Sora "could understand." He said that the prompts had to be very straightforward, completely literal.In an experiment, he wanted to achieve a close-up perspective on a helicopter, but the short film made by Sora mixed the helicopter and the camera's zoom lens together.
However, Stevenson said that with many creative prompts, Sora is more controllable than previous models.
Even so, he thinks the reason why this technology is fun to use is that there are always surprises: "I like less control and its chaos. If you want to control editing and visual effects, there are many other video production tools that can do the job."
For Stevenson, the primary goal of using generative models like Sora is to get strange and unexpected materials.
In "Beyond Our Reality," the strange animals are all generated by Sora. Stevenson tried many different prompts until the tool generated something he liked.He said, "I am its director, but more like a force." He would revise repeatedly, constantly trying various changes.
For example, the fox-crow that Stevenson envisioned had four legs, but Sora only gave it two. Although the effect was better, it was not perfect: sharp-eyed viewers would notice that in the video, the fox-crow at one point changed from two legs to four legs, and then changed back again.
Sora also made several versions of the video, which he thought looked too creepy to use.
When he had collected the strange creatures he really liked, he edited them together, then added subtitles and voiceovers.
Although Stevenson could also have created his fictional zoo with existing tools, he said it would take hours, even days. With Sora, the process is much faster.He said, "I tried to come up with something that looks cool and experimented with a lot of different characters. I have many video clips that include random creatures."
When he saw the giraffe flamingo generated by Sora, he realized he could do more. He said, "I started to think: What is the story of this creature? What does it eat? Where does it live?" He plans to release a series of expanded short films that introduce each fantasy animal in more detail.
Stevenson also hopes that his fantasy animals can trigger larger thoughts. "There will be a lot of new types of content on social media," he said, "How will we tell people what is real? In my opinion, one way is to tell stories that are obviously fantasy (false)."
Stevenson pointed out that his short film may be the first time many people see a video created by a generative model. He hopes that the first impression this video conveys to people can clearly indicate: This is not real.
POST A COMMENT