结构意识到人类行为的产生

论文标题

结构意识到人类行为的产生

Structure-Aware Human-Action Generation

论文作者

Yu, Ping, Zhao, Yang, Li, Chunyuan, Yuan, Junsong, Chen, Changyou

论文摘要

产生基于长距离骨架的人类行为一直是一个具有挑战性的问题，因为一个框架的小偏差会导致畸形的动作序列。大多数现有的方法从视频生成中借用了想法，这些想法天真地将骨骼节点/关节视为图像的像素，而无需考虑丰富的框架间和框架内结构信息，从而导致潜在的扭曲动作。图形卷积网络（GCN）是利用结构信息来学习结构表示的有前途的方法。但是，直接采用GCN来解决空间和时间空间中的连续动作序列，因为动作图可能很大，这是具有挑战性的。为了克服这个问题，我们提出了一种GCN的变体，以利用强大的自我发项机制来适应时间在时间空间中的完整动作图。我们的方法可以动态地关注重要的过去框架，并构建一个稀疏图以应用于GCN框架中，从而很好地捕捉了动作序列中的结构信息。与现有方法相比，广泛的实验结果证明了我们方法在两个标准人类动作数据集上的优越性。

Generating long-range skeleton-based human actions has been a challenging problem since small deviations of one frame can cause a malformed action sequence. Most existing methods borrow ideas from video generation, which naively treat skeleton nodes/joints as pixels of images without considering the rich inter-frame and intra-frame structure information, leading to potential distorted actions. Graph convolutional networks (GCNs) is a promising way to leverage structure information to learn structure representations. However, directly adopting GCNs to tackle such continuous action sequences both in spatial and temporal spaces is challenging as the action graph could be huge. To overcome this issue, we propose a variant of GCNs to leverage the powerful self-attention mechanism to adaptively sparsify a complete action graph in the temporal space. Our method could dynamically attend to important past frames and construct a sparse graph to apply in the GCN framework, well-capturing the structure information in action sequences. Extensive experimental results demonstrate the superiority of our method on two standard human action datasets compared with existing methods.

下载PDF全文

下载文献需遵守相关版权规定

论文标题