最近几天关于SORA的信息已经满天飞了,很多人都已经看过由SORA生成的经验作品。
但是我觉得很少有人能够静下心来来分析一下OPENAI发布的技术报告还有背后的技术。
这几天我就陆续开始对SORA的技术报告进行一下详细解读,尽快了解这项技术。
目前关于SORA的资料均来自OPENAI官方网站
一个是产品介绍,另一个就是技术报告 (技术报告中详细的技术路线并没有进行披露,只对原理、术语做了一些概括性的描述)
我们先来看OPENAI的产品介绍
OpenAI对Sora的定义是
--
Sora
is an AI model that can create realistic and imaginative scenes from text instructions.Sora
是一个可以根据文本指令创建逼真和富有想象力场景的人工智能模型。
从目前的视频来看来看,逼真(realistic)和富有想象力(imaginative)都做到了。
Sora是一个理解真实世界互动的模型
在这一部分OpenAI给出了9个视频,我们一个一个来看
Promote:A stylish woman walks down a Tokyo street filled with warm glowing neon and animated city signage. She wears a black leather jacket, a long red dress, and black boots, and carries a black purse. She wears sunglasses and red lipstick. She walks confidently and casually. The street is damp and reflective, creating a mirror effect of the colorful lights. Many pedestrians walk about.
提示词:一位时尚的女士走在东京街头,街道上充满了温暖的霓虹灯和生动的城市标识。她穿着一件黑色皮夹克,一条长长的红色连衣裙,黑色靴子,背着一个黑色的手提包。她戴着墨镜和红色口红。她自信而随意地走着。街道潮湿而反光,形成了五彩灯光的镜像效果。许多行人在周围走动。
这个视频放在了第一位,最大的原因是这个视频时长为一分钟,是目前AI能连贯生成的最长视频,在视频中街道背景很自然,路人神态动作都很流畅连贯,包括地上的水坑倒影基本都与真实世界一致。在第36秒,AI自动切换了镜头,但是主体人物和背景都没有发生变化,有很好的`一致性`,魔镜上的反光也正确反应了整体背景,似乎AI能够真的设计出了一个完整的空间。
Promote:Several giant wooly mammoths approach treading through a snowy meadow, their long wooly fur lightly blows in the wind as they walk, snow covered trees and dramatic snow capped mountains in the distance, mid afternoon light with wispy clouds and a sun high in the distance creates a warm glow, the low camera view is stunning capturing the large furry mammal with beautiful photography, depth of field.
提示词:几只巨大的长毛猛犸象踏过雪地草地,它们长长的毛发在风中轻轻飘动,远处被积雪覆盖的树木和壮丽的雪山,午后的光线下,稀薄的云彩和远处高悬的太阳营造出温暖的光芒,低角度的摄像机视角令人惊叹,捕捉到了这只大型毛茸茸的哺乳动物,具有美丽的摄影深度。
第2个视频就是对猛犸象身上毛发的表现很细腻、很自然,比很多五毛特效强太多了。
Promote:A movie trailer featuring the adventures of the 30 year old space man wearing a red wool knitted motorcycle helmet, blue sky, salt desert, cinematic style, shot on 35mm film, vivid colors.
提示词:一部电影预告片,讲述一个30岁的太空人穿着红色羊毛编织的摩托车头盔,蓝天,盐沙漠,电影风格,35毫米胶片拍摄,色彩生动。
第3个视频的看点就是`电影感`,Sora正确的构建了主角和场景,并通过场景切换、运镜等手法,给出了一个很有艺术感的镜头。
Promote:Drone view of waves crashing against the rugged cliffs along Big Sur’s garay point beach. The crashing blue waters create white-tipped waves, while the golden light of the setting sun illuminates the rocky shore. A small island with a lighthouse sits in the distance, and green shrubbery covers the cliff’s edge. The steep drop from the road down to the beach is a dramatic feat, with the cliff’s edges jutting out over the sea. This is a view that captures the raw beauty of the coast and the rugged landscape of the Pacific Coast Highway.
提示词:无人机俯视大苏尔加雷角海滩上的波浪拍打在崎岖的悬崖上。蓝色的海水拍打出白色的浪花,而夕阳的金色光线照亮了岩石岸边。远处有一座带灯塔的小岛,悬崖边覆盖着绿色灌木丛。从路边到海滩的陡峭下降是一场戏剧性的壮举,悬崖边缘突出在海面上。这是一个捕捉到海岸原始美丽和太平洋海岸公路崎岖风景的景色。
第4个视频最突出的就是`真实感`,这个视频我看了好几遍,没有看出一丝破绽,就和平常看到的风景视频没什么不同,运镜相当丝滑,主体景观没有一点抖动变化,有着高度的一致性。
Promote:Animated scene features a close-up of a short fluffy monster kneeling beside a melting red candle. The art style is 3D and realistic, with a focus on lighting and texture. The mood of the painting is one of wonder and curiosity, as the monster gazes at the flame with wide eyes and open mouth. Its pose and expression convey a sense of innocence and playfulness, as if it is exploring the world around it for the first time. The use of warm colors and dramatic lighting further enhances the cozy atmosphere of the image.
提示词:动画场景展示了一个矮矮的毛茸茸怪物跪在一个融化的红蜡烛旁边的特写镜头。艺术风格是3D和逼真的,侧重于光线和质感。这幅画的情绪是惊奇和好奇,怪物睁大眼睛,张开嘴巴凝视着火焰。它的姿势和表情传达出一种天真和调皮的感觉,好像它正在第一次探索周围的世界。运用温暖色调和戏剧性的灯光进一步增强了图像的温馨氛围。
第5个视频就是能够正确的表达出了情感,惊奇、好奇、天真、调皮这些特点都能够很传神的表达出来。
Promote: gorgeously rendered papercraft world of a coral reef, rife with colorful fish and sea creatures.
提示词:绚丽呈现的珊瑚礁纸艺世界,充满了五颜六色的鱼类和海洋生物。
第6个视频表现出了Sora的创造性,根据真实珊瑚礁的创造出了一个纸艺世界,表现的也很传神。
Promote:This close-up shot of a Victoria crowned pigeon showcases its striking blue plumage and red chest. Its crest is made of delicate, lacy feathers, while its eye is a striking red color. The bird’s head is tilted slightly to the side, giving the impression of it looking regal and majestic. The background is blurred, drawing attention to the bird’s striking appearance.
提示词:这张细节特写展示了一只维多利亚凤冠鸠引人注目的蓝色羽毛和红色胸膛。它的冠由精致的蕾丝羽毛制成,而它的眼睛是醒目的红色。鸟的头微微倾斜,给人一种威严和崇高的印象。背景模糊,突显出鸟的引人注目的外表。
第7个视频我就不多说了,我百度了一下维多利亚凤冠鸠(Victoria crowned pigeon)长什么样,结果他真长这个样,可能以后真的再也不需要建模了吧。

Promote:Photorealistic closeup video of two pirate ships battling each other as they sail inside a cup of coffee.
提示词:两艘海盗船在一杯咖啡中激烈对抗的逼真特写视频。
第8个视频注意看船和水的运动,很连贯、也很符合物理规律,这个可能就是OpenAI说的`understand and simulate the physical world in motion`
Promote:A young man at his 20s is sitting on a piece of cloud in the sky, reading a book.
提示词:一个二十多岁的年轻人坐在天空中的一片云上,正在看书。
最后一个视频我没有看出来特别经验的地方,也能看出一些不自然的痕迹,不过完成度也已经相当高了。
Sora是一个富有艺术气息的模型
根据OpenAI的说法,他们给了视觉艺术家、设计师、电影人访问权限,根据他们的反馈不断调整模型,可能有了专业人士的帮助,Sora会变得更加具有艺术气质吧。在这一块中,OpenAI给出了8个不同风格的视频,写实、创意、动漫风格都很突出。
Promote:Historical footage of California during the gold rush.
提示词:加利福尼亚淘金热时期的历史影像。
第1个视频很有历史纪录片的感觉,场景很庞大,但是细节很丰富,没看到两条腿的🐴和四条腿的人
Promote:A close up view of a glass sphere that has a zen garden within it. There is a small dwarf in the sphere who is raking the zen garden and creating patterns in the sand.
提示词:一个玻璃球的特写视角,里面有一个枯矮人在细耙着禅园,给沙子创造出图案。
第2个视频的关键词是`glass sphere 玻璃球`,在这个视频中我们看到玻璃球上的倒影和边界都很清晰,镜头摇移过程中完全没有穿帮现象。
Promote:Extreme close up of a 24 year old woman’s eye blinking, standing in Marrakech during magic hour, cinematic film shot in 70mm, depth of field, vivid colors, cinematic
提示词:在马拉喀什的魔幻时刻,拍摄一位24岁女性眼睛眨动的极端特写镜头,采用70毫米的电影拍摄,景深,鲜艳的色彩,电影感
第3个视频对皮肤细节处理堪称完美,没有看出一点破绽。
Promote:A cartoon kangaroo disco dances.
提示词:一只卡通袋鼠在迪斯科舞。
第4个视频是卡通风格,主角的表情、动作很自然,我也看不来什么AI生成的痕迹,以后小朋友看的动画会不会都是AI生成出来的呢?😭
Promote:A beautiful homemade video showing the people of Lagos, Nigeria in the year 2056. Shot with a mobile phone camera.
提示词:2056年用手机摄像头拍摄的展示尼日利亚拉各斯人民的美丽家庭视频。
第5个视频对手机拍摄风格模拟得很像,对尼日利亚特点及2056年的展望融合的很好,看来AI对未来还是比较乐观的,没有把未来世界想象出一片废土情况,美中不足的是主角人物的眼神和动作不是很自然,不知道通过修改提升次会不会有改善。
Promote:A petri dish with a bamboo forest growing within it that has tiny red pandas running around.
提示词:一个培养着竹林并有小红熊猫在其中奔跑的培养皿。
第6个视频就写实风格的创意影片,以后的小广告是不是都可以用Sora做了?
Promote:The camera rotates around a large stack of vintage televisions all showing different programs — 1950s sci-fi movies, horror movies, news, static, a 1970s sitcom, etc, set inside a large New York museum gallery.
提示词:摄像机围绕着一堆大量的老式电视旋转,所有电视都在播放不同的节目——1950年代的科幻电影、恐怖电影、新闻、静态画面、1970年代的情景喜剧等,场景设定在纽约一家大型博物馆画廊内。
第7个视频是一堆电视,每个电视的画面都是不同内容,并且互补干扰,真实绝了。
Promote:3D animation of a small, round, fluffy creature with big, expressive eyes explores a vibrant, enchanted forest. The creature, a whimsical blend of a rabbit and a squirrel, has soft blue fur and a bushy, striped tail.It hops along a sparkling stream, its eyes wide with wonder. The forest is alive with magical elements: flowers that glow and change colors, trees with leaves in shades of purple and silver, and small floating lights that resemble fireflies.The creature stops to interact playfully with a group of tiny, fairy-like beings dancing around a mushroom ring. The creature looks up in awe at a large, glowing tree that seems to be the heart of the forest.
提示词:一个小、圆、蓬松的生物,有着大大的、富有表现力的眼睛,在一个充满活力的、魔幻的森林中探险。这个生物是兔子和松鼠的奇妙融合,它有着柔软的蓝色皮毛和一条蓬松的、带条纹的尾巴。它沿着一条闪闪发光的小溪跳跃,眼睛中充满了惊奇。森林中充满了神奇的元素:发光变色的花朵,紫色和银色叶子的树木,以及类似萤火虫的小飘动光点。生物停下来与一群围绕着蘑菇圈跳舞的小仙灵玩耍。生物敬畏地抬头看着一个巨大发光的树,它似乎是森林的中心。
第8个视频这个毛发、光影都很有意思,不知道迪士尼和皮克斯看完这个视频后有没有哭晕在厕所。
Sora是一个知道物理法则的模型
Sora能够生成具有多个角色、特定动作类型和主题背景的视频,不仅能理解用户在Promote中的要求,还了解这些事物在现实世界中的运行规律,这一部分OpenAI给出了8个视频。
Promote:The camera follows behind a white vintage SUV with a black roof rack as it speeds up a steep dirt road surrounded by pine trees on a steep mountain slope, dust kicks up from it’s tires, the sunlight shines on the SUV as it speeds along the dirt road, casting a warm glow over the scene.The dirt road curves gently into the distance, with no other cars or vehicles in sight. The trees on either side of the road are redwoods, with patches of greenery scattered throughout.The car is seen from the rear following the curve with ease, making it seem as if it is on a rugged drive through the rugged terrain. The dirt road itself is surrounded by steep hills and mountains, with a clear blue sky above with wispy clouds.
提示词:摄像机跟随一辆白色老式SUV,车顶有黑色行李架,它加速通过一条陡峭的泥土路,周围是松树,山坡陡峭,车轮掀起尘土,阳光照在SUV上,它在泥土路上飞驰,给整个场景带来温暖的光芒。土路在远处轻轻弯曲,看不到其他车辆。路两旁是红杉树,零星分布着一些绿色植被。汽车从后面看起来轻松地沿着弯曲的道路行驶,仿佛在崎岖的地形中行驶。这条土路被陡峭的山丘和山脉环绕,上方是清澈的蓝天,飘着薄云。
第1个视频给我们展示了一句话生成一个世界的能力。
Promote:Reflections in the window of a train traveling through the Tokyo suburbs.
提示词:透过玻璃窗看到东京郊区的倒影。
第2个视频我以为只是简单单单的在玻璃上反射处理倒影,想不到在第4秒的时候Sora给自己加戏,在经过一个一闪而过的柱子的时候,跟我们一个清晰的人物倒影,和现实世界中一模一样,可能Sora真的理解这个世界了。

Promote:A drone camera circles around a beautiful historic church built on a rocky outcropping along the Amalfi Coast, the view showcases historic and magnificent architectural details and tiered pathways and patios, waves are seen crashing against the rocks below as the view overlooks the horizon of the coastal waters and hilly landscapes of the Amalfi Coast Italy, several distant people are seen walking and enjoying vistas on patios of the dramatic ocean views, the warm glow of the afternoon sun creates a magical and romantic feeling to the scene, the view is stunning captured with beautiful photography.
提示词:一架无人机摄像头围绕着一座建在阿马尔菲海岸岩石突出部的美丽历史教堂盘旋,景色展示了历史悠久且宏伟的建筑细节和分层路径和露台,海浪拍打在下方的岩石上,俯瞰阿马尔菲海岸意大利的海岸水域和多山景观的地平线,几个远处的人们正在露台上漫步并欣赏戏剧性海景,午后阳光的温暖光辉为场景带来了神奇和浪漫的感觉,这一景象被美丽的摄影所捕捉。
第3跟视频Sora很好的根据要求布置了场景,而且我发现Sora生成的风景视频都无可挑剔,是因为风景照片特别多,特别是这些热门景点的图片、视频资料多,所以训练的效果就特别好么?
Promote:A large orange octopus is seen resting on the bottom of the ocean floor, blending in with the sandy and rocky terrain. Its tentacles are spread out around its body, and its eyes are closed.The octopus is unaware of a king crab that is crawling towards it from behind a rock, its claws raised and ready to attack. The crab is brown and spiny, with long legs and antennae. The scene is captured from a wide angle, showing the vastness and depth of the ocean.The water is clear and blue, with rays of sunlight filtering through. The shot is sharp and crisp, with a high dynamic range. The octopus and the crab are in focus, while the background is slightly blurred, creating a depth of field effect.
提示词:在海底的沙石地形中,一只大橙色章鱼被发现正在休息,与周围融为一体。它的触手伸展在身体周围,眼睛闭着。章鱼不知道一只帝王蟹正从岩石后面爬向它,它的爪子高举着,准备进攻。螃蟹是棕色的,多刺的,有长腿和触须。场景被广角拍摄,展示了海洋的广阔和深度。水清澈而蓝,阳光透过水面。拍摄清晰而鲜明,具有高动态范围。章鱼和螃蟹处于焦点,而背景略有模糊,产生景深效果。
第4个视频Sora很好的完成了大章鱼和帝王蟹两个主角互动,说明Sora对多个主体能够做到比较完美的控制
Promote:A cat waking up its sleeping owner demanding breakfast. The owner tries to ignore the cat, but the cat tries new tactics and finally the owner pulls out a secret stash of treats from under the pillow to hold the cat off a little longer.
提示词:一只猫叫醒正在睡觉的主人要求早餐。主人试图无视猫,但猫尝试新策略,最终主人从枕头底下拿出一小部分秘密零食来拖延猫一会儿。
第5个视频还是表现的多个主角,不过视频有较大的穿帮,比如说主人的手以及猫的爪子
Promote:A flock of paper airplanes flutters through a dense jungle, weaving around trees as if they were migrating birds.
提示词:一群纸飞机在茂密的丛林中翩翩起舞,在树林间盘旋,仿佛候鸟。
第6个视频背景相当复杂,但是生成的视频仍旧很流畅。
Promote:Borneo wildlife on the Kinabatangan River
提示词:京那巴登岸河上的婆罗洲野生动物
第7个视频还是一个风景视频,我也没去过婆罗洲,Sora说那有这样的怪鸟那就有吧,总体来看,Sora生成的自然风光视频是优于人物视频的。
Promote:A Chinese Lunar New Year celebration video with Chinese Dragon.
提示词:一个带有中国龙的中国农历新年庆祝视频。
第8个视频完全可以注意一下这些NPC,这些背景群众都是跟着中间的舞龙运动的,特别是右下角的黄衣女士,拿着手机对着舞龙就是一顿猛拍,实在是太有意思了。
未完待续
由于内容比较多,所以分成几块来讲,下一节,我们会一起来欣赏一下Sora的`翻车视频`。

A stylish woman walks down a Tokyo street filled with warm glowing neon and animated city signage. She wears a black leather jacket, a long red dress, and black boots, and carries a black purse. She wears sunglasses and red lipstick. She walks confidently and casually. The street is damp and reflective, creating a mirror effect of the colorful lights. Many pedestrians walk about. --ar 16:9 --v 6.0
—— 图生成自midjourney,以上是所用到咒语,采用MJ version 6.0
技术发展得真快啊