阿里又上新！开源 7B 文生图模型，专治图中文字，效果媲美 20B+ 大模型！ - 文章 - 开发者社区

最近阿里在开源界频出狠活，不仅放出了 6B 大小的 Z-Image-Turbo，还开源了一款 7B 参数的主打「图中文字」的文生图模型：Ovis-Image 。

picture.image

图中的“文字渲染”能力是它的强项，也就是生成海报、宣传图、Logo、UI 原型、信息图这类必须让文字看得清、排得稳、对得齐的图。

大家知道的，绝大多数文生图模型——尤其是开源模型，都很容易把文字生得像咒语、像乱码、像虫子。

没想到，这次阿里 AIDC-AI 团队只用 7B 参数，就整出了能和 20B+ 大模型硬肛的效果。

核心亮点

1、文本渲染能力媲美大模型

在主流图像模型还在为“文字扭曲”“字体不准”“中英混输崩溃”这些老问题头疼时，Ovis-Image 上来就把中文、英文、字体风格、图文融合这些难点一起解决了。

picture.image

从官方Demo看，中文英文呈现非常清晰，字体风格准确。文字无扭曲无塌缩，多种字体、字重、字号、宽高比均可控。

关键是它只有 7B。

甚至比很多本地可部署的模型还小，却能打出堪比 20B~30B 模型的文字精度。

2、权威榜单成绩炸裂

picture.image

CVTG-2K 文字渲染榜：平均正确率 92%

• GPT-4o：85%
• Qwen-Image：82%

直接甩开两个重量级产品一截。

picture.image

LongText-Bench 长文本能力

• 英文：92.2%（略低于 GPT-4o 的 95.6%）
• 中文：96.4%（高于 Qwen-Image 的 94.6%）

尤其是中文超长内容 — 海报、Banner、信息图、宣传页里常出现的排版密集场景，稳定、清晰、少错字。

这个能力对于国内设计、电商、品牌团队来说，价值极高。

快速入手

官方提供了一个可直接在浏览器中尝试使用 Ovis-Image 的在线 Gradio。

picture.image

如下安装到本地，指令如下：


 
 
 
 
   
git clone git@github.com:AIDC-AI/Ovis-Image.git  
conda create -n ovis-image python=3.10 -y  
conda activate ovis-image  
cd Ovis-Image  
pip install -r requirements.txt  
pip install -e .

要将文本转换为图像，请运行


 
 
 
 
   
python ovis\_image/test.py \  
    --model\_path AIDC-AI/Ovis-Image-7B/ovis\_image.safetensors \  
    --vae\_path AIDC-AI/Ovis-Image-7B/ae.safetensors \  
    --ovis\_path AIDC-AI/Ovis-Image-7B/Ovis2.5-2B \  
    --image\_size 1024 \  
    --denoising\_steps 50 \  
    --cfg\_scale 5.0 \  
    --prompt "A creative 3D artistic render where the text \"OVIS-IMAGE\" is written in a bold, expressive handwritten brush style using thick, wet oil paint. The paint is a mix of vibrant rainbow colors (red, blue, yellow) swirling together like toothpaste or impasto art. You can see the ridges of the brush bristles and the glossy, wet texture of the paint. The background is a clean artist's canvas. Dynamic lighting creates soft shadows behind the floating paint strokes. Colorful, expressive, tactile texture, 4k detail." \

再总结一下 Ovis-Image 亮点：