大模型系列 | Grounded-SAM:只动嘴不动手的全自动化视觉工具 - 文章 - 开发者社区

picture.image

https://github.com/IDEA-Research/Grounded-Segment-Anything

经过了几天的迭代，Grounded-SAM迎来了第二波更新，这一波我们直接一步到位，直接集结了Whisper、ChatGPT、Stable Diffusion、Segment Anything四大领域的Foundation Models，我和小伙伴们 @CiaoHe @隆啊隆 @Sakura.D @十指透兮的阳光做了一个只动嘴不动手的全自动化视觉工具。

其中第二次主要更新来自 @CiaoHe @隆啊隆两位大佬之手。

picture.image

可以想象 未来只需要语音交互就能够完成所有的视觉工作流任务 ，这是多么奇妙的一件事情啊！

h ttps://github.com/IDEA-Research/Grounded-Segment-Anything

The core idea behind this project is to combine the strengths of different models in order to build a very powerful pipeline for solving complex problems . And it's worth mentioning that this is a workflow for combining strong expert models, where all parts can be used separately or in combination, and can be replaced with any similar but different models (like replacing Grounding DINO with GLIP or other detectors / replacing Stable-Diffusion with ControlNet or GLIGEN/ Combining with ChatGPT) .

Segment Anything is a strong segmentation model. But it needs prompts (like boxes/points) to generate masks.
Grounding DINO is a strong zero-shot detector which is capable of to generate high quality boxes and labels with free-form text.
The combination of Grounding DINO + SAM enable to detect and segment everything at any levels with text inputs!
The combination of BLIP + Grounding DINO + SAM for automatic labeling system!
The combination of Grounding DINO + SAM + Stable-diffusion for data-factory, generating new data!
The combination of Whisper + Grounding DINO + SAM to detect and segment anything with speech!