分层i3d用于标志斑点

论文标题

分层i3d用于标志斑点

Hierarchical I3D for Sign Spotting

论文作者

Wong, Ryan, Camgöz, Necati Cihan, Bowden, Richard

论文摘要

迄今为止，大多数基于视觉的手语研究都集中在孤立的手语识别（ISLR）上，其中的目的是预测一个单个标志类，给定一个简短的视频剪辑。尽管ISLR取得了重大进展，但其现实生活中的应用是有限的。在本文中，我们专注于签名发现的挑战性任务，目标是同时识别和本地化标志在连续的共同签名视频中。为了解决当前基于ISLR的模型的局限性，我们提出了一种分层符号斑点方法，该方法可以学习粗到更紧缩的时空标志特征，以利用各个时间级别的表示形式，并提供更精确的符号定位。具体而言，我们开发了层次符号i3d模型（HS-I3D），该模型由一个层次网络头组成，该层次网络头连接到现有的时空速率i3d模型上，以利用网络不同层的功能。我们在Chalearn 2022标志发现挑战-MSSL轨道上评估HS-I3D，并获得最先进的0.607 F1得分，这是比赛的前1位获胜解决方案。

Most of the vision-based sign language research to date has focused on Isolated Sign Language Recognition (ISLR), where the objective is to predict a single sign class given a short video clip. Although there has been significant progress in ISLR, its real-life applications are limited. In this paper, we focus on the challenging task of Sign Spotting instead, where the goal is to simultaneously identify and localise signs in continuous co-articulated sign videos. To address the limitations of current ISLR-based models, we propose a hierarchical sign spotting approach which learns coarse-to-fine spatio-temporal sign features to take advantage of representations at various temporal levels and provide more precise sign localisation. Specifically, we develop Hierarchical Sign I3D model (HS-I3D) which consists of a hierarchical network head that is attached to the existing spatio-temporal I3D model to exploit features at different layers of the network. We evaluate HS-I3D on the ChaLearn 2022 Sign Spotting Challenge - MSSL track and achieve a state-of-the-art 0.607 F1 score, which was the top-1 winning solution of the competition.

下载PDF全文

下载文献需遵守相关版权规定

论文标题