论文标题

利用基于图的跨模式信息融合神经手语翻译

Leveraging Graph-based Cross-modal Information Fusion for Neural Sign Language Translation

论文作者

Zheng, Jiangbin, Li, Siyuan, Tan, Cheng, Wu, Chong, Chen, Yidong, Li, Stan Z.

论文摘要

手语(SL)作为聋人社区的母语,是一种特殊的视觉语言,大多数人听不到。近年来,神经手语翻译(SLT)是弥合聋人与听力人之间沟通差距的可能方法,引起了广泛的学术关注。我们发现,当前主流的端到端神经SLT模型试图以弱监督的方式学习语言知识,在低数据资源的条件下无法挖掘足够的语义信息。因此,我们建议介绍手语语言语言学的其他单词级别的语义知识,以帮助改善当前的端到端神经SLT模型。 Concretely, we propose a novel neural SLT model with multi-modal feature fusion based on the dynamic graph, in which the cross-modal information, i.e. text and video, is first assembled as a dynamic graph according to their correlation, and then the graph is processed by a multi-modal graph encoder to generate the multi-modal embeddings for further usage in the subsequent neural translation models.据我们所知,我们是第一个引入图形神经网络,将多模式信息融合到神经手语翻译模型中的人。此外,我们在公开可用的SLT数据集RWTH-PHOENIX-WEATER-2014T上进行了实验。定量实验表明我们的方法可以改善模型。

Sign Language (SL), as the mother tongue of the deaf community, is a special visual language that most hearing people cannot understand. In recent years, neural Sign Language Translation (SLT), as a possible way for bridging communication gap between the deaf and the hearing people, has attracted widespread academic attention. We found that the current mainstream end-to-end neural SLT models, which tries to learning language knowledge in a weakly supervised manner, could not mine enough semantic information under the condition of low data resources. Therefore, we propose to introduce additional word-level semantic knowledge of sign language linguistics to assist in improving current end-to-end neural SLT models. Concretely, we propose a novel neural SLT model with multi-modal feature fusion based on the dynamic graph, in which the cross-modal information, i.e. text and video, is first assembled as a dynamic graph according to their correlation, and then the graph is processed by a multi-modal graph encoder to generate the multi-modal embeddings for further usage in the subsequent neural translation models. To the best of our knowledge, we are the first to introduce graph neural networks, for fusing multi-modal information, into neural sign language translation models. Moreover, we conducted experiments on a publicly available popular SLT dataset RWTH-PHOENIX-Weather-2014T. and the quantitative experiments show that our method can improve the model.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源