利用基于图的跨模式信息融合神经手语翻译

论文标题

利用基于图的跨模式信息融合神经手语翻译

Leveraging Graph-based Cross-modal Information Fusion for Neural Sign Language Translation

论文作者

Zheng, Jiangbin, Li, Siyuan, Tan, Cheng, Wu, Chong, Chen, Yidong, Li, Stan Z.

论文摘要

手语（SL）作为聋人社区的母语，是一种特殊的视觉语言，大多数人听不到。近年来，神经手语翻译（SLT）是弥合聋人与听力人之间沟通差距的可能方法，引起了广泛的学术关注。我们发现，当前主流的端到端神经SLT模型试图以弱监督的方式学习语言知识，在低数据资源的条件下无法挖掘足够的语义信息。因此，我们建议介绍手语语言语言学的其他单词级别的语义知识，以帮助改善当前的端到端神经SLT模型。 Concretely, we propose a novel neural SLT model with multi-modal feature fusion based on the dynamic graph, in which the cross-modal information, i.e. text and video, is first assembled as a dynamic graph according to their correlation, and then the graph is processed by a multi-modal graph encoder to generate the multi-modal embeddings for further usage in the subsequent neural translation models.据我们所知，我们是第一个引入图形神经网络，将多模式信息融合到神经手语翻译模型中的人。此外，我们在公开可用的SLT数据集RWTH-PHOENIX-WEATER-2014T上进行了实验。定量实验表明我们的方法可以改善模型。

Sign Language (SL), as the mother tongue of the deaf community, is a special visual language that most hearing people cannot understand. In recent years, neural Sign Language Translation (SLT), as a possible way for bridging communication gap between the deaf and the hearing people, has attracted widespread academic attention. We found that the current mainstream end-to-end neural SLT models, which tries to learning language knowledge in a weakly supervised manner, could not mine enough semantic information under the condition of low data resources. Therefore, we propose to introduce additional word-level semantic knowledge of sign language linguistics to assist in improving current end-to-end neural SLT models. Concretely, we propose a novel neural SLT model with multi-modal feature fusion based on the dynamic graph, in which the cross-modal information, i.e. text and video, is first assembled as a dynamic graph according to their correlation, and then the graph is processed by a multi-modal graph encoder to generate the multi-modal embeddings for further usage in the subsequent neural translation models. To the best of our knowledge, we are the first to introduce graph neural networks, for fusing multi-modal information, into neural sign language translation models. Moreover, we conducted experiments on a publicly available popular SLT dataset RWTH-PHOENIX-Weather-2014T. and the quantitative experiments show that our method can improve the model.

下载PDF全文

下载文献需遵守相关版权规定

论文标题