论文标题

Gatehub:带有背景抑制在线操作检测的封闭历史单元

GateHUB: Gated History Unit with Background Suppression for Online Action Detection

论文作者

Chen, Junwen, Mittal, Gaurav, Yu, Ye, Kong, Yu, Chen, Mei

论文摘要

在线操作检测是在流视频中发生的一旦操作即可预测该动作的任务。一个主要的挑战是,该模型无法访问未来,并且必须仅依靠历史,即到目前为止观察到的框架来做出预测。因此,重要的是要强调历史的一部分,这些部分对当前框架的预测更具信息性。我们提出了带有背景抑制的封闭历史单元的Gatehub,其中包括一种新的位置引导的封闭式跨意义机制,以增强或抑制部分历史的一部分,因为它们在当前框架预测中的信息性如何。 GateHub进一步建议未来的历史记录(FAH),通过使用后来观察到的框架,使历史特征更具信息性。在单个统一的框架中,GateHub整合了变压器的远程时间建模的能力以及经常性模型选择性编码相关信息的能力。 GateHub还引入了一个背景抑制目标,以进一步减轻与动作框架非常相似的误报背景框架。对三个基准数据集(Thumos,TVSeries和HDD)进行了广泛的验证,这表明GateHub显着胜过所有现有方法,并且比现有最佳工作更有效。此外,与所有需要RGB和光流信息进行预测的现有方法相比,GateHub的无流量版本能够以2.8倍的帧速率获得更高或密切的精度。

Online action detection is the task of predicting the action as soon as it happens in a streaming video. A major challenge is that the model does not have access to the future and has to solely rely on the history, i.e., the frames observed so far, to make predictions. It is therefore important to accentuate parts of the history that are more informative to the prediction of the current frame. We present GateHUB, Gated History Unit with Background Suppression, that comprises a novel position-guided gated cross-attention mechanism to enhance or suppress parts of the history as per how informative they are for current frame prediction. GateHUB further proposes Future-augmented History (FaH) to make history features more informative by using subsequently observed frames when available. In a single unified framework, GateHUB integrates the transformer's ability of long-range temporal modeling and the recurrent model's capacity to selectively encode relevant information. GateHUB also introduces a background suppression objective to further mitigate false positive background frames that closely resemble the action frames. Extensive validation on three benchmark datasets, THUMOS, TVSeries, and HDD, demonstrates that GateHUB significantly outperforms all existing methods and is also more efficient than the existing best work. Furthermore, a flow-free version of GateHUB is able to achieve higher or close accuracy at 2.8x higher frame rate compared to all existing methods that require both RGB and optical flow information for prediction.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源