那些和3D相关AI大模型资源库

技术

这是一个关于由大型语言模型(LLMs)驱动的3D相关任务的精选论文列表。它包含了各种任务，包括3D理解、推理、生成和具体化代理。

picture.image

[1]目录

• Awesome-LLM-3D[2]

• 3D 理解[3]

• 3D 推理[4]

• 3D 生成[5]

• 3D 具体化代理[6]

• 3D 基准测试[7]

• 贡献[8]

[9]3D 理解

| 编号 | 关键词 | 机构 | 论文 | 发表 | 其他 | | 1 | 3D-LLM | 加州大学洛杉矶分校 | 3D-LLM: 将3D世界注入大型语言模型[10] | NeurIPS'2023 | github[11] | | 2 | LL3DA | 复旦大学 | LL3DA: 全方位3D理解、推理和规划的视觉交互式指令调整[12] | Arxiv | github[13] | | 3 | LLM-Grounder | 密歇根大学 | LLM-Grounder: 用大型语言模型作为代理的开放词汇3D视觉定位[14] | Arxiv | github[15] | | 4 | Point-Bind | 香港中文大学 | Point-Bind & Point-LLM: 用多模态对齐点云进行3D理解、生成和指令跟随[16] | Arxiv | github[17] | | 5 | 3D-VisTA | BIGAI | 3D-VisTA: 用于3D视觉和文本对齐的预训练变换器[18] | ICCV'2023 | github[19] | | 6 | LEO | BIGAI | 3D世界中的全能具体化代理[20] | Arxiv | github[21] | | 7 | OpenScene | ETHz | OpenScene: 开放词汇的3D场景理解[22] | CVPR'2022 | github[23] | | 8 | LERF | 加州大学伯克利分校 | LERF: 嵌入语言的辐射场[24] | ICCV'2023 | github[25] | | 9 | ViewRefer | 香港中文大学 | ViewRefer: 用于3D视觉定位的多视图知识把握[26] | ICCV'2023 | github[27] | | 10 | Contrastive Lift | 牛津大学-VGG | Contrastive Lift: 通过慢-快对比融合的3D对象实例分割[28] | NeurIPS'2023 | github[29] | | 11 | CLIP2Scene | 香港大学 | CLIP2Scene: 通过CLIP实现标签高效的3D场景理解[30] | CVPR'2023 | github[31] | | 12 | PointLLM | 香港中文大学 | PointLLM: 让大型语言模型理解点云[32] | Arxiv | github[33] | | 13 | - | 麻省理工学院 | 利用大型（视觉）语言模型进行机器人3D场景理解[34] | Arxiv | github[35] | | 14 | Chat-3D | 浙江大学 | Chat-3D: 数据高效的大型语言模型调整，用于3D场景的通用对话[36] | Arxiv | github[37] | | 15 | PLA | 香港大学 | PLA: 用语言驱动的开放词汇3D场景理解[38] | CVPR'2023 | github[39] | | 16 | UniT3D | 慕尼黑工业大学 | UniT3D: 用于3D密集字幕和视觉定位的统一变换器[40] | ICCV'2023 | github[41] | | 17 | CG3D | 约翰霍普金斯大学 | CLIP进入3D：利用提示调整进行语言引导的3D识别[42] | Arxiv | github[43] | | 18 | JM3D-LLM | 厦门大学 | JM3D & JM3D-LLM: 用联合多模态线索提升3D表示[44] | ACM MM'2023 | github[45] | | 19 | Open-Fusion | - | Open-Fusion: 实时开放词汇3D映射和可查询场景表示[46] | Arxiv | github[47] | | 20 | - | - | 从语言到3D世界：适应语言模型以进行点云感知[48] | OpenReview | - | | 21 | OpenNerf | - | OpenNerf: 开放集3D神经场景分割与像素级特征和渲染新视角[49] | OpenReview | github[50] | | 22 | - | 卡斯特大学 & LIX | 零次学习3D形状对应[51] | Siggraph Asia 2023 | - |

[52]3D 推理

| 编号 | 关键词 | 机构（首发） | 论文 | 发表 | 其他 | | 1 | 3D-CLR | 加州大学洛杉矶分校 | 3D概念学习与推理：来自多视角图像[53] | CVPR'2023 | github[54] | | 2 | Transcribe3D | TTI, 芝加哥 | Transcribe3D: 利用转录信息实现3D参考推理的LLMs接地与自我修正微调[55] | CoRL'2023 | github[56] |

[57]3D 生成

| 编号 | 关键词 | 机构 | 论文 | 发表 | 其他 | | 1 | 3D-GPT | 澳大利亚国立大学 | 3D-GPT: 使用大型语言模型的程序化3D建模[58] | Arxiv | github[59] | | 2 | MeshGPT | 慕尼黑工业大学 | MeshGPT: 仅解码器变换器生成三角网格[60] | Arxiv | 项目[61] | | 3 | ShapeGPT | 复旦大学 | ShapeGPT: 用统一多模态语言模型进行3D形状生成[62] | Arxiv | github[63] | | 4 | DreamLLM | MEGVII & 清华大学 | DreamLLM: 协同多模态理解与创造[64] | Arxiv | github[65] | | 5 | LLMR | 麻省理工学院、RPI & 微软 | LLMR: 使用大型语言模型的交互式世界实时提示[66] | Arxiv | github[67] | | 6 | ChatAvatar | Deemos Tech | DreamFace: 在文本指导下逐步生成可动画3D面部[68] | ACM TOG | 网站[69] |

[70]3D 具体化代理

| 编号 | 关键词 | 机构 | 论文 | 发表 | 其他 | | 1 | RT-1 | 谷歌 | RT-1: 实现大规模实际控制的机器人变换器[71] | Arxiv | github[72] | | 2 | RT-2 | 谷歌-DeepMind | RT-2: 视觉-语言-行动模型转移网络知识到机器人控制[73] | Arxiv | github[74] | | 3 | SayPlan | QUT机器人中心 | SayPlan: 使用3D场景图为基础的大型语言模型，实现可扩展机器人任务规划[75] | CoRL'2023 | github[76] | | 4 | UniHSI | 上海AI实验室 | 统一人-场景交互：通过提示的接触链[77] | Arxiv | github[78] | | 5 | LLM-Planner | 俄亥俄州立大学 | LLM-Planner: 用大型语言模型进行具体化代理的少样本基础规划[79] | ICCV'2023 | github[80] | | 6 | STEVE | 浙江大学 & 华盛顿大学 | 看和思考：虚拟环境中的具体化代理[81] | Arxiv | github[82] |

[83]3D 基准测试

| 编号 | 关键词 | 机构 | 论文 | 发表 | 其他 | | 1 | ScanQA | RIKEN AIP | ScanQA: 用于空间场景理解的3D问答[84] | CVPR'2023 | github[85] | | 2 | ScanRefer | 慕尼黑工业大学 | ScanRefer: 使用自然语言在RGB-D扫描中定位3D对象[86] | ECCV'2020 | github[87] | | 3 | Scan2Cap | 慕尼黑工业大学 | Scan2Cap: RGB-D扫描中的环境感知密集字幕[88] | CVPR'2021 | github[89] | | 4 | SQA3D | BIGAI | SQA3D: 3D场景中的情境问答[90] | ICLR'2023 | github[91] | | 5 | - | DeepMind & UCL | 评估VLMs：用于3D对象的基于分数的多探针注释[92] | Arxiv | github[93] |

[94]引用

更多信息请参考：https://github.com/ActiveVisionLab/Awesome-LLM-3D

References

[1] : https://github.com/ActiveVisionLab/Awesome-LLM-3D#table-of-content
[2] Awesome-LLM-3D: https://github.com/ActiveVisionLab/Awesome-LLM-3D#awesome-llm-3D
[3] 3D 理解: https://github.com/ActiveVisionLab/Awesome-LLM-3D#3d-understanding%5D
[4] 3D 推理: https://github.com/ActiveVisionLab/Awesome-LLM-3D#3d-reasoning
[5] 3D 生成: https://github.com/ActiveVisionLab/Awesome-LLM-3D#3d-generation
[6] 3D 具体化代理: https://github.com/ActiveVisionLab/Awesome-LLM-3D#3d-embodied-agent
[7] 3D 基准测试: https://github.com/ActiveVisionLab/Awesome-LLM-3D#3d-benchmarks
[8] 贡献: https://github.com/ActiveVisionLab/Awesome-LLM-3D#contributing
[9] : https://github.com/ActiveVisionLab/Awesome-LLM-3D#3d-understanding
[10] 3D-LLM: 将3D世界注入大型语言模型: https://arxiv.org/pdf/2307.12981.pdf
[11] github: https://github.com/UMass-Foundation-Model/3D-LLM
[12] LL3DA: 全方位3D理解、推理和规划的视觉交互式指令调整: https://arxiv.org/pdf/2311.18651.pdf
[13] github: https://github.com/Open3DA/LL3DA
[14] LLM-Grounder: 用大型语言模型作为代理的开放词汇3D视觉定位: https://arxiv.org/pdf/2309.12311.pdf
[15] github: https://github.com/sled-group/chat-with-nerf
[16] Point-Bind & Point-LLM: 用多模态对齐点云进行3D理解、生成和指令跟随: https://arxiv.org/pdf/2309.00615.pdf
[17] github: https://github.com/ZiyuGuo99/Point-Bind\_Point-LLM
[18] 3D-VisTA: 用于3D视觉和文本对齐的预训练变换器: https://arxiv.org/abs/2308.04352
[19] github: https://github.com/ActiveVisionLab/Awesome-LLM-3D/blob/avl-branch
[20] 3D世界中的全能具体化代理: https://arxiv.org/pdf/2311.12871.pdf
[21] github: https://github.com/embodied-generalist/embodied-generalist
[22] OpenScene: 开放词汇的3D场景理解: https://arxiv.org/pdf/2211.15654.pdf
[23] github: https://github.com/pengsongyou/openscene
[24] LERF: 嵌入语言的辐射场: https://arxiv.org/pdf/2303.09553.pdf
[25] github: https://github.com/kerrj/lerf
[26] ViewRefer: 用于3D视觉定位的多视图知识把握: https://arxiv.org/pdf/2303.16894.pdf
[27] github: https://github.com/Ivan-Tang-3D/ViewRefer3D
[28] Contrastive Lift: 通过慢-快对比融合的3D对象实例分割: https://arxiv.org/pdf/2306.04633.pdf
[29] github: https://github.com/yashbhalgat/Contrastive-Lift
[30] CLIP2Scene: 通过CLIP实现标签高效的3D场景理解: https://arxiv.org/pdf/2301.04926.pdf
[31] github: https://github.com/runnanchen/CLIP2Scene
[32] PointLLM: 让大型语言模型理解点云: https://arxiv.org/pdf/2308.16911.pdf
[33] github: https://github.com/OpenRobotLab/PointLLM
[34] 利用大型（视觉）语言模型进行机器人3D场景理解: https://arxiv.org/pdf/2209.05629.pdf
[35] github: https://github.com/MIT-SPARK/llm\_scene\_understanding
[36] Chat-3D: 数据高效的大型语言模型调整，用于3D场景的通用对话: https://arxiv.org/pdf/2308.08769v1.pdf
[37] github: https://github.com/Chat-3D/Chat-3D
[38] PLA: 用语言驱动的开放词汇3D场景理解: https://arxiv.org/pdf/2211.16312.pdf
[39] github: https://github.com/CVMI-Lab/PLA
[40] UniT3D: 用于3D密集字幕和视觉定位的统一变换器: https://openaccess.thecvf.com/content/ICCV2023/papers/Chen\_UniT3D\_A\_Unified\_Transformer\_for\_3D\_Dense\_Captioning\_and\_Visual\_ICCV\_2023\_paper.pdf
[41] github: https://github.com/ActiveVisionLab/Awesome-LLM-3D/blob/avl-branch
[42] CLIP进入3D：利用提示调整进行语言引导的3D识别: https://arxiv.org/pdf/2303.11313.pdf
[43] github: https://github.com/deeptibhegde/CLIP-goes-3D
[44] JM3D & JM3D-LLM: 用联合多模态线索提升3D表示: https://arxiv.org/pdf/2310.09503v2.pdf
[45] github: https://github.com/mr-neko/jm3d
[46] Open-Fusion: 实时开放词汇3D映射和可查询场景表示: https://arxiv.org/pdf/2310.03923.pdf
[47] github: https://github.com/UARK-AICV/OpenFusion
[48] 从语言到3D世界：适应语言模型以进行点云感知: https://openreview.net/forum?id=H49g8rRIiF
[49] OpenNerf: 开放集3D神经场景分割与像素级特征和渲染新视角: https://openreview.net/pdf?id=SgjAojPKb3
[50] github: https://github.com/ActiveVisionLab/Awesome-LLM-3D/blob/avl-branch
[51] 零次学习3D形状对应: https://arxiv.org/abs/2306.03253
[52] : https://github.com/ActiveVisionLab/Awesome-LLM-3D#3d-reasoning
[53] 3D概念学习与推理：来自多视角图像: https://arxiv.org/pdf/2303.11327.pdf
[54] github: https://github.com/evelinehong/3D-CLR-Official
[55] Transcribe3D: 利用转录信息实现3D参考推理的LLMs接地与自我修正微调: https://openreview.net/pdf?id=7j3sdUZMTF
[56] github: https://github.com/ActiveVisionLab/Awesome-LLM-3D/blob/avl-branch
[57] : https://github.com/ActiveVisionLab/Awesome-LLM-3D#3d-generation
[58] 3D-GPT: 使用大型语言模型的程序化3D建模: https://arxiv.org/pdf/2310.12945.pdf
[59] github: https://github.com/ActiveVisionLab/Awesome-LLM-3D/blob/avl-branch
[60] MeshGPT: 仅解码器变换器生成三角网格: https://arxiv.org/pdf/2311.15475.pdf
[61] 项目: https://nihalsid.github.io/mesh-gpt/
[62] ShapeGPT: 用统一多模态语言模型进行3D形状生成: https://arxiv.org/pdf/2311.17618.pdf
[63] github: https://github.com/OpenShapeLab/ShapeGPT
[64] DreamLLM: 协同多模态理解与创造: https://arxiv.org/pdf/2309.11499.pdf
[65] github: https://dreamllm.github.io/
[66] LLMR: 使用大型语言模型的交互式世界实时提示: https://arxiv.org/pdf/2309.12276.pdf
[67] github: https://github.com/ActiveVisionLab/Awesome-LLM-3D/blob/avl-branch
[68] DreamFace: 在文本指导下逐步生成可动画3D面部: https://dl.acm.org/doi/abs/10.1145/3592094
[69] 网站: https://hyperhuman.deemos.com/
[70] : https://github.com/ActiveVisionLab/Awesome-LLM-3D#3d-embodied-agent
[71] RT-1: 实现大规模实际控制的机器人变换器: https://robotics-transformer1.github.io/assets/rt1.pdf
[72] github: https://robotics-transformer1.github.io/
[73] RT-2: 视觉-语言-行动模型转移网络知识到机器人控制: https://arxiv.org/pdf/2307.15818.pdf
[74] github: https://robotics-transformer2.github.io/
[75] SayPlan: 使用3D场景图为基础的大型语言模型，实现可扩展机器人任务规划: https://arxiv.org/pdf/2307.06135.pdf
[76] github: https://sayplan.github.io/
[77] 统一人-场景交互：通过提示的接触链: https://arxiv.org/pdf/2309.07918.pdf
[78] github: https://github.com/OpenRobotLab/UniHSI
[79] LLM-Planner: 用大型语言模型进行具体化代理的少样本基础规划: https://arxiv.org/pdf/2212.04088.pdf
[80] github: https://github.com/OSU-NLP-Group/LLM-Planner/
[81] 看和思考：虚拟环境中的具体化代理: https://arxiv.org/abs/2311.15209
[82] github: https://github.com/rese1f/STEVE
[83] : https://github.com/ActiveVisionLab/Awesome-LLM-3D#3d-benchmarks
[84] ScanQA: 用于空间场景理解的3D问答: https://arxiv.org/pdf/2112.10482.pdf
[85] github: https://github.com/ATR-DBI/ScanQA
[86] ScanRefer: 使用自然语言在RGB-D扫描中定位3D对象: https://arxiv.org/pdf/2112.10482.pdf
[87] github: https://daveredrum.github.io/ScanRefer/
[88] Scan2Cap: RGB-D扫描中的环境感知密集字幕: https://arxiv.org/pdf/2012.02206.pdf
[89] github: https://github.com/daveredrum/Scan2Cap
[90] SQA3D: 3D场景中的情境问答: https://arxiv.org/pdf/2210.07474.pdf
[91] github: https://github.com/SilongYong/SQA3D
[92] 评估VLMs：用于3D对象的基于分数的多探针注释: https://arxiv.org/pdf/2311.17851.pdf
[93] github: https://github.com/ActiveVisionLab/Awesome-LLM-3D/blob/avl-branch
[94] : https://github.com/ActiveVisionLab/Awesome-LLM-3D#contributing

0

0

0

0

关于作者

关于作者

文章

0

获赞

0

收藏

0

评论

未登录

暂无评论