开源AI视频生成 Pyramid Flow 部署实测：能比肩 Sora？ - 文章 - 开发者社区

picture.image

猴哥的第 122 期分享，欢迎追看

前段时间，给微信 AI 小助理接入了文生视频的能力：

我把「国产Sora」接入了「小爱」，邀你免费体验

底层是智谱开源的 CogVideo，最近新出了一款视频生成模型 - pyramid-flow-sd3，社区反馈效果要优于 CogVideo。

今日分享，手把手带大家在本地部署，实测看看是否如宣传一般惊艳。

Pyramid Flow 简介

项目地址：https://github.com/jy0205/Pyramid-Flow

老规矩，先来简单介绍下~

Pyramid Flow 有哪些亮点？

仅需 2B 参数，可生成768p分辨率、24fps的10秒视频；
支持「文本到视频」和「图像到视频」；
自回归生成，基于先前帧来预测生成后续帧，确保视频内容的连贯性和流畅性；
金字塔式的多尺度架构，在不同分辨率的潜变量之间进行插值，因此生成效率更高：

官方评测结果：除了semantic score，其它指标均优于开源方案 CogVideo：

picture.image

在线体验

在线体验地址：https://huggingface.co/spaces/Pyramid-Flow/pyramid-flow

Pyramid Flow 已上线 huggingface，无需本地部署，即刻在线体验！

如无法访问，可参看官方的生成样例：https://pyramid-flow.github.io/

接下来，我们把模型在本地跑起来。

本地部署

3.1 环境准备

首先准备 Pyramid Flow 环境：

  
git clone https://github.com/jy0205/Pyramid-Flow  
cd Pyramid-Flow  
conda create -n pyramid python==3.8.10  
conda activate pyramid  
pip install -r requirements.txt

然后，把模型下载到本地，方便调用：

  
export HF_ENDPOINT=https://hf-mirror.com  
huggingface-cli download rain1011/pyramid-flow-sd3 --local-dir ckpts/

其中，模型权重包括 768p 和 384p 两种版本。384p版本支持 5 秒长的 24 FPS视频，而 768p 版本则可以生成 10 秒。

3.2 推理测试

首先，加载模型进来：

  
import os  
import torch  
from PIL import Image  
from pyramid_dit import PyramidDiTForVideoGeneration  
from diffusers.utils import export_to_video  
  
os.environ['CUDA\_VISIBLE\_DEVICES'] = '2'  
  
model = PyramidDiTForVideoGeneration('ckpts/', 'bf16', model_variant='diffusion\_transformer\_384p')  
  
model.vae.enable_tiling()  
# model.vae.to("cuda")  
# model.dit.to("cuda")  
# model.text\_encoder.to("cuda")  
# if you're not using sequential offloading bellow uncomment the lines above ^  
model.enable_sequential_cpu_offload()

如果把模型都加载进 GPU，至少需要 19G 显存，否则建议采用上述代码！

然后，测试文本生成视频 ：

  
def t2v():  
    prompt = "A movie trailer featuring the adventures of the 30 year old space man wearing a red wool knitted motorcycle helmet, blue sky, salt desert, cinematic style, shot on 35mm film, vivid colors"  
    with torch.no_grad(), torch.amp.autocast('cuda', dtype=torch.bfloat16):  
        frames = model.generate(  
            prompt=prompt,  
            num_inference_steps=[20, 20, 20],  
            video_num_inference_steps=[10, 10, 10],  
            height=384,       
            width=640,  
            temp=16,                    # temp=16: 5s, temp=31: 10s  
            guidance_scale=9.0,         # The guidance for the first frame, set it to 7 for 384p variant  
            video_guidance_scale=5.0,   # The guidance for the other video latent  
            output_type="pil",  
            save_memory=True,           # If you have enough GPU memory, set it to `False` to improve vae decoding speed  
        )  
    export_to_video(frames, "./text\_to\_video\_sample.mp4", fps=24)

测试图片生成视频 ：

  
def i2v():  
    image = Image.open('assets/the\_great\_wall.jpg').convert("RGB").resize((640, 384))  
    prompt = "FPV flying over the Great Wall"  
    with torch.no_grad(), torch.amp.autocast('cuda', dtype=torch.bfloat16):  
        frames = model.generate_i2v(  
            prompt=prompt,  
            input_image=image,  
            num_inference_steps=[10, 10, 10],  
            temp=16,  
            video_guidance_scale=4.0,  
            output_type="pil",  
            save_memory=True,           # If you have enough GPU memory, set it to `False` to improve vae decoding speed  
        )  
    export_to_video(frames, "./image\_to\_video\_sample.mp4", fps=24)

Pyramid Flow 对显存要求较高，否则生成 5 秒视频，至少 13 分钟：

  
100%|████| 16/16 [13:11<00:00, 49.45s/it]

生成效果咋样？

实测来看，并未能和 CogVideo 拉开差距啊。

写在最后

本文带大家本地部署并实测了最新开源的视频生成模型 - Pyramid Flow。

AI 应用大体可分为：文本、语音、图片、视频，其中语音已被硅基生物攻破。

而 AI 视频生成，从当前效果来看。。。依然任重道远！

如果对你有帮助，欢迎点赞收藏 备用。

为方便大家交流，新建了一个 AI 交流群，欢迎感兴趣的小伙伴加入。

最近打造的微信机器人小爱(AI)也在群里，公众号后台「联系我」，拉你进群。

👇 关注猴哥，快速入门AI工具

picture.image

# AI 工具：

本地部署大模型？看这篇就够了，Ollama 部署和实战

盘点9家免费且靠谱的AI大模型 API，统一封装，任性调用！

# AI应用** ：**

弃坑 Coze，我把 Dify 接入了个人微信，AI小助理太强了

我把「FLUX」接入了「小爱」，微信直接出图，告别一切绘画软件！

202K 下载！最强开源OCR：本地部署，邀你围观体验