论文标题
分层i3d用于标志斑点
Hierarchical I3D for Sign Spotting
论文作者
论文摘要
迄今为止,大多数基于视觉的手语研究都集中在孤立的手语识别(ISLR)上,其中的目的是预测一个单个标志类,给定一个简短的视频剪辑。尽管ISLR取得了重大进展,但其现实生活中的应用是有限的。在本文中,我们专注于签名发现的挑战性任务,目标是同时识别和本地化标志在连续的共同签名视频中。为了解决当前基于ISLR的模型的局限性,我们提出了一种分层符号斑点方法,该方法可以学习粗到更紧缩的时空标志特征,以利用各个时间级别的表示形式,并提供更精确的符号定位。具体而言,我们开发了层次符号i3d模型(HS-I3D),该模型由一个层次网络头组成,该层次网络头连接到现有的时空速率i3d模型上,以利用网络不同层的功能。我们在Chalearn 2022标志发现挑战-MSSL轨道上评估HS-I3D,并获得最先进的0.607 F1得分,这是比赛的前1位获胜解决方案。
Most of the vision-based sign language research to date has focused on Isolated Sign Language Recognition (ISLR), where the objective is to predict a single sign class given a short video clip. Although there has been significant progress in ISLR, its real-life applications are limited. In this paper, we focus on the challenging task of Sign Spotting instead, where the goal is to simultaneously identify and localise signs in continuous co-articulated sign videos. To address the limitations of current ISLR-based models, we propose a hierarchical sign spotting approach which learns coarse-to-fine spatio-temporal sign features to take advantage of representations at various temporal levels and provide more precise sign localisation. Specifically, we develop Hierarchical Sign I3D model (HS-I3D) which consists of a hierarchical network head that is attached to the existing spatio-temporal I3D model to exploit features at different layers of the network. We evaluate HS-I3D on the ChaLearn 2022 Sign Spotting Challenge - MSSL track and achieve a state-of-the-art 0.607 F1 score, which was the top-1 winning solution of the competition.