多代理运动预测的动态和静态上下文感知的LSTM

论文标题

多代理运动预测的动态和静态上下文感知的LSTM

Dynamic and Static Context-aware LSTM for Multi-agent Motion Prediction

论文作者

Tao, Chaofan, Jiang, Qinhong, Duan, Lixin, Luo, Ping

论文摘要

多代理运动预测是具有挑战性的，因为它旨在预见到复杂场景中多种代理（\ textit {例如textit {ef。}行人）的未来轨迹。现有的工作通过学习以一组行人的位置来表示的社交空间互动来解决这一挑战，同时忽略了他们的时间连贯性（\ textit {i.e。}之间的依赖性不同的长轨迹之间的依赖性），或者通过理解复杂的场景布局（\ textit {efextit {e.g。}场景分割）来确保安全导航。但是，与以前隔离空间相互作用，时间连贯性和场景布局的工作不同，本文设计了一种新的机制，即\ textit {i.e。}，动态和静态上下文感知的运动预测器（DSCMP），以将这些丰富的信息集成到长期偏差的内存（LSTM）中。它有三个吸引人的好处。（1）DSCMP通过学习其空间位置和时间连贯性，以及了解上下文场景布局不同。（2）与以前的LSTM模型不同。（2）与以前的LSTM模型不同。轨迹。（3）DSCMP通过推断潜在变量来捕获场景的上下文，这可以通过有意义的语义场景布局进行多模式预测。广泛的实验表明，DSCMP的表现要优于大幅度的最先进方法，例如9.05 \％和7.62 \％的相对改进，分别对ETH-UCY和SDD数据集的相对改进。

Multi-agent motion prediction is challenging because it aims to foresee the future trajectories of multiple agents (\textit{e.g.} pedestrians) simultaneously in a complicated scene. Existing work addressed this challenge by either learning social spatial interactions represented by the positions of a group of pedestrians, while ignoring their temporal coherence (\textit{i.e.} dependencies between different long trajectories), or by understanding the complicated scene layout (\textit{e.g.} scene segmentation) to ensure safe navigation. However, unlike previous work that isolated the spatial interaction, temporal coherence, and scene layout, this paper designs a new mechanism, \textit{i.e.}, Dynamic and Static Context-aware Motion Predictor (DSCMP), to integrates these rich information into the long-short-term-memory (LSTM). It has three appealing benefits. (1) DSCMP models the dynamic interactions between agents by learning both their spatial positions and temporal coherence, as well as understanding the contextual scene layout.(2) Different from previous LSTM models that predict motions by propagating hidden features frame by frame, limiting the capacity to learn correlations between long trajectories, we carefully design a differentiable queue mechanism in DSCMP, which is able to explicitly memorize and learn the correlations between long trajectories. (3) DSCMP captures the context of scene by inferring latent variable, which enables multimodal predictions with meaningful semantic scene layout. Extensive experiments show that DSCMP outperforms state-of-the-art methods by large margins, such as 9.05\% and 7.62\% relative improvements on the ETH-UCY and SDD datasets respectively.

下载PDF全文

下载文献需遵守相关版权规定

论文标题