人类关键点检测与人类机器人相互作用的紧密接近

论文标题

人类关键点检测与人类机器人相互作用的紧密接近

Human keypoint detection for close proximity human-robot interaction

论文作者

Docekal, Jan, Rozlivek, Jakub, Matas, Jiri, Hoffmann, Matej

论文摘要

我们研究了在紧邻人类机器人相互作用的背景下，最先进的人关键点探测器的性能。在这种情况下的检测是具体的，因为只有手和躯干等身体部位的子集在视野中。特别是（i）我们从近距离图像的角度调查了具有人类姿势注释的现有数据集，并准备并使公开可用的新人（HICP）数据集；（ii）我们在此数据集上进行定量和定性比较人类全身2D关键点检测方法（openpose，mmpose，onphapose，detectron2）；（iii）由于对手指的准确检测对于使用移交的应用至关重要，因此我们评估了介质手工探测器的性能；（iv）我们在头部上带有RGB-D摄像头的人形机器人上部署算法，并在3D人关键点检测中评估性能。运动捕获系统用作参考。近距离近距离的最佳性能全身关键点探测器是mmpose和字母，但两者都难以检测手指检测。因此，我们在单个框架中提出了人体的mmpose或hampipe的Mmpose或字母组合，提供了最准确，最强大的检测。我们还分析了单个探测器的故障模式 - 例如，图像中人的头部缺乏降低性能的程度。最后，我们在一个方案中演示了框架，其中类人类机器人与一个人相互作用的人类机器人使用检测到的3D关键点进行全身避免动作。

We study the performance of state-of-the-art human keypoint detectors in the context of close proximity human-robot interaction. The detection in this scenario is specific in that only a subset of body parts such as hands and torso are in the field of view. In particular, (i) we survey existing datasets with human pose annotation from the perspective of close proximity images and prepare and make publicly available a new Human in Close Proximity (HiCP) dataset; (ii) we quantitatively and qualitatively compare state-of-the-art human whole-body 2D keypoint detection methods (OpenPose, MMPose, AlphaPose, Detectron2) on this dataset; (iii) since accurate detection of hands and fingers is critical in applications with handovers, we evaluate the performance of the MediaPipe hand detector; (iv) we deploy the algorithms on a humanoid robot with an RGB-D camera on its head and evaluate the performance in 3D human keypoint detection. A motion capture system is used as reference. The best performing whole-body keypoint detectors in close proximity were MMPose and AlphaPose, but both had difficulty with finger detection. Thus, we propose a combination of MMPose or AlphaPose for the body and MediaPipe for the hands in a single framework providing the most accurate and robust detection. We also analyse the failure modes of individual detectors -- for example, to what extent the absence of the head of the person in the image degrades performance. Finally, we demonstrate the framework in a scenario where a humanoid robot interacting with a person uses the detected 3D keypoints for whole-body avoidance maneuvers.

下载PDF全文

下载文献需遵守相关版权规定

论文标题