论文标题
带有内存增强的侧向变压器的Spotfast网络用于口头
SpotFast Networks with Memory Augmented Lateral Transformers for Lipreading
论文作者
论文摘要
本文介绍了一种新颖的深度学习架构,用于单词级唇读。先前的作品暗示了将预验证的深3D卷积神经网络纳入前端特征提取器的潜力。我们介绍了一个Spotfast网络,这是最先进的慢速网络以进行操作识别的变体,该网络将时间窗口用作点途径,所有框架用作快速途径。我们进一步结合了内存增强的侧向变压器,以学习分类的顺序特征。我们在LRW数据集上评估了所提出的模型。实验表明,我们提出的模型优于各种最新模型,并结合内存增强的横向变压器的横向变压器对Spotfast网络的提高了3.7%。
This paper presents a novel deep learning architecture for word-level lipreading. Previous works suggest a potential for incorporating a pretrained deep 3D Convolutional Neural Networks as a front-end feature extractor. We introduce a SpotFast networks, a variant of the state-of-the-art SlowFast networks for action recognition, which utilizes a temporal window as a spot pathway and all frames as a fast pathway. We further incorporate memory augmented lateral transformers to learn sequential features for classification. We evaluate the proposed model on the LRW dataset. The experiments show that our proposed model outperforms various state-of-the-art models and incorporating the memory augmented lateral transformers makes a 3.7% improvement to the SpotFast networks.