论文标题
从机器人3D场景理解的大型语言模型中提取零照片的常识
Extracting Zero-shot Common Sense from Large Language Models for Robot 3D Scene Understanding
论文作者
论文摘要
语义3D场景理解是机器人技术中至关重要的问题。尽管在同时本地化和映射算法方面取得了重大进展,但机器人仍然远没有关于家庭对象及其普通人的位置的常识知识。我们介绍了一种新的方法,用于利用嵌入在大语言模型中的常识来标记室内包含的对象。该算法具有(i)不需要特定特定任务的预训练(完全在零拍摄方案中运行)和(ii)推广到任意房间和对象标签的额外好处,包括以前的未经证实的标签 - 这两者都是在机器人场景中理解算法的高度可取的特征。所提出的算法在现代空间感知系统制作的3D场景图上运行,我们希望它将为机器人技术提供更具概括性和可扩展的高级3D场景理解铺平道路。
Semantic 3D scene understanding is a problem of critical importance in robotics. While significant advances have been made in simultaneous localization and mapping algorithms, robots are still far from having the common sense knowledge about household objects and their locations of an average human. We introduce a novel method for leveraging common sense embedded within large language models for labelling rooms given the objects contained within. This algorithm has the added benefits of (i) requiring no task-specific pre-training (operating entirely in the zero-shot regime) and (ii) generalizing to arbitrary room and object labels, including previously-unseen ones -- both of which are highly desirable traits in robotic scene understanding algorithms. The proposed algorithm operates on 3D scene graphs produced by modern spatial perception systems, and we hope it will pave the way to more generalizable and scalable high-level 3D scene understanding for robotics.