Skeleton-DML：基于骨架的一声动作识别的深度度量学习

论文标题

Skeleton-DML：基于骨架的一声动作识别的深度度量学习

Skeleton-DML: Deep Metric Learning for Skeleton-Based One-Shot Action Recognition

论文作者

Memmesheimer, Raphael, Häring, Simon, Theisen, Nick, Paulus, Dietrich

论文摘要

仅通过一个训练示例，一次性动作识别允许识别人类表现的动作。这可以通过使机器人对以前看不见的行为做出反应来积极影响人类机器人的交流。我们将单发操作识别问题提出为深度度量学习问题，并提出了一种基于图像的新型骨骼表示，在公制学习环境中表现良好。因此，我们训练一个将图像表示形式投射到嵌入空间中的模型。在嵌入空间中，类似的动作具有较低的欧几里得距离，而不同的动作具有更高的距离。一组活动参考样本中，一次性动作识别问题成为最近的邻居搜索。我们根据各种基于骨架的图像表示评估了我们提出的表示的性能。此外，我们提出了一项消融研究，该研究显示了不同嵌入矢量大小，损失和增强的影响。我们的方法在NTU RGB+D 120数据集中的单发操作识别方案中，将最新的动作识别协议提高了3.3％。随着额外的增强，我们的结果提高了7.7％以上。

One-shot action recognition allows the recognition of human-performed actions with only a single training example. This can influence human-robot-interaction positively by enabling the robot to react to previously unseen behaviour. We formulate the one-shot action recognition problem as a deep metric learning problem and propose a novel image-based skeleton representation that performs well in a metric learning setting. Therefore, we train a model that projects the image representations into an embedding space. In embedding space the similar actions have a low euclidean distance while dissimilar actions have a higher distance. The one-shot action recognition problem becomes a nearest-neighbor search in a set of activity reference samples. We evaluate the performance of our proposed representation against a variety of other skeleton-based image representations. In addition, we present an ablation study that shows the influence of different embedding vector sizes, losses and augmentation. Our approach lifts the state-of-the-art by 3.3% for the one-shot action recognition protocol on the NTU RGB+D 120 dataset under a comparable training setup. With additional augmentation our result improved over 7.7%.

下载PDF全文

下载文献需遵守相关版权规定

论文标题