探索更多指导：通过数据扩展增强手语翻译的任务感知指令网络

论文标题

探索更多指导：通过数据扩展增强手语翻译的任务感知指令网络

Explore More Guidance: A Task-aware Instruction Network for Sign Language Translation Enhanced with Data Augmentation

论文作者

Cao, Yong, Li, Wei, Li, Xianzhi, Chen, Min, Chen, Guangyong, Hu, Long, Li, Zhengdao, Kai, Hwang

论文摘要

手语识别和翻译首先使用识别模块来生成手语视频中的光泽，然后采用翻译模块将光泽转换为口语句子。大多数现有的作品都集中在识别步骤上，同时对手语翻译的关注较少。在这项工作中，我们通过将指令模块和基于学习的功能FUSE策略介绍到变压器网络中，提出了一个任务感知的指令网络，即锡SLT，用于手语翻译。这样，可以很好地探索和利用预培训的模型的语言能力，以进一步提高翻译性能。此外，通过探索手语颜色和目标口语的表示空间，我们提出了一个多级数据增强方案，以调整培训集的数据分布。我们对两个具有挑战性的基准数据集（Phoenix-2014-T和ASLG-PC12）进行了广泛的实验，我们的方法在BLEU-4方面，我们的方法在其上比以前的最佳解决方案优于1.65和1.42。我们的代码发表在https://github.com/yongcaoplus/tin-slt上。

Sign language recognition and translation first uses a recognition module to generate glosses from sign language videos and then employs a translation module to translate glosses into spoken sentences. Most existing works focus on the recognition step, while paying less attention to sign language translation. In this work, we propose a task-aware instruction network, namely TIN-SLT, for sign language translation, by introducing the instruction module and the learning-based feature fuse strategy into a Transformer network. In this way, the pre-trained model's language ability can be well explored and utilized to further boost the translation performance. Moreover, by exploring the representation space of sign language glosses and target spoken language, we propose a multi-level data augmentation scheme to adjust the data distribution of the training set. We conduct extensive experiments on two challenging benchmark datasets, PHOENIX-2014-T and ASLG-PC12, on which our method outperforms former best solutions by 1.65 and 1.42 in terms of BLEU-4. Our code is published at https://github.com/yongcaoplus/TIN-SLT.

下载PDF全文

下载文献需遵守相关版权规定

论文标题