大模型系列 | Grounded-SAM:只动嘴不动手的全自动化视觉工具

大模型智能语音交互智能应用

picture.image

https://github.com/IDEA-Research/Grounded-Segment-Anything

经过了几天的迭代,Grounded-SAM迎来了第二波更新,这一波我们直接一步到位,直接集结了Whisper、ChatGPT、Stable Diffusion、Segment Anything四大领域的Foundation Models,我和小伙伴们 @CiaoHe @隆啊隆 @Sakura.D @十指透兮的阳光做了一个只动嘴不动手的全自动化视觉工具。

其中第二次主要更新来自 @CiaoHe @隆啊隆两位大佬之手。

picture.image

picture.image

可以想象 未来只需要语音交互就能够完成所有的视觉工作流任务 ,这是多么奇妙的一件事情啊!

h ttps://github.com/IDEA-Research/Grounded-Segment-Anything

The core idea behind this project is to combine the strengths of different models in order to build a very powerful pipeline for solving complex problems . And it's worth mentioning that this is a workflow for combining strong expert models, where all parts can be used separately or in combination, and can be replaced with any similar but different models (like replacing Grounding DINO with GLIP or other detectors / replacing Stable-Diffusion with ControlNet or GLIGEN/ Combining with ChatGPT) .

  • Segment Anything is a strong segmentation model. But it needs prompts (like boxes/points) to generate masks.
  • Grounding DINO is a strong zero-shot detector which is capable of to generate high quality boxes and labels with free-form text.
  • The combination of Grounding DINO + SAM enable to detect and segment everything at any levels with text inputs!
  • The combination of BLIP + Grounding DINO + SAM for automatic labeling system!
  • The combination of Grounding DINO + SAM + Stable-diffusion for data-factory, generating new data!
  • The combination of Whisper + Grounding DINO + SAM to detect and segment anything with speech!
0
0
0
0
关于作者
关于作者

文章

0

获赞

0

收藏

0

相关资源
vivo 容器化平台架构与核心能力建设实践
为了实现规模化降本提效的目标,vivo 确定了基于云原生理念构建容器化生态的目标。在容器化生态发展过程中,平台架构不断演进,并针对业务的痛点和诉求,持续完善容器化能力矩阵。本次演讲将会介绍 vivo 容器化平台及主要子系统的架构设计,并分享重点建设的容器化核心能力。
相关产品
评论
未登录
看完啦,登录分享一下感受吧~
暂无评论