对西里尔传统蒙古双向转换的RNN和自我注意的深入研究

论文标题

对西里尔传统蒙古双向转换的RNN和自我注意的深入研究

A Deep Investigation of RNN and Self-attention for the Cyrillic-Traditional Mongolian Bidirectional Conversion

论文作者

Na, Muhan, Liu, Rui, Feilong, Gao, Guanglai

论文摘要

西里尔和传统蒙古人是蒙古写作系统的两个主要成员。西里尔传统的蒙古双向转换（CTMBC）任务包括两个转换过程，包括西里尔蒙古人到传统的蒙古人（C2T）和传统的蒙古人到西里尔蒙古人转换（T2C）。以前的研究人员采用了传统的联合序列模型，因为CTMBC任务是自然序列到序列（SEQ2SEQ）建模问题。最近的研究表明，基于复发的神经网络（RNN）和自我注意力（或变压器）的编码器模型模型显示，某些主要语言之间的机器翻译任务（例如普通话，英语，法语等）之间的机器翻译任务有显着改善。但是，对于是否可以通过使用RNN和Transformer模型来改善CTMBC质量的开放问题。为了回答这个问题，本文研究了这两种强大的CTMBC任务技术的实用性，并结合了蒙古语的凝集性特征。我们分别基于RNN和Transformer构建了基于编码器的CTMBC模型，并深入比较了不同的网络配置。实验结果表明，RNN和Transformer模型都优于传统的关节序列模型，其中变压器可以达到最佳性能。与关节序列基线相比，C2T和T2C的变压器的单词错误率（WER）分别降低了5.72 \％和5.06 \％。

Cyrillic and Traditional Mongolian are the two main members of the Mongolian writing system. The Cyrillic-Traditional Mongolian Bidirectional Conversion (CTMBC) task includes two conversion processes, including Cyrillic Mongolian to Traditional Mongolian (C2T) and Traditional Mongolian to Cyrillic Mongolian conversions (T2C). Previous researchers adopted the traditional joint sequence model, since the CTMBC task is a natural Sequence-to-Sequence (Seq2Seq) modeling problem. Recent studies have shown that Recurrent Neural Network (RNN) and Self-attention (or Transformer) based encoder-decoder models have shown significant improvement in machine translation tasks between some major languages, such as Mandarin, English, French, etc. However, an open problem remains as to whether the CTMBC quality can be improved by utilizing the RNN and Transformer models. To answer this question, this paper investigates the utility of these two powerful techniques for CTMBC task combined with agglutinative characteristics of Mongolian language. We build the encoder-decoder based CTMBC model based on RNN and Transformer respectively and compare the different network configurations deeply. The experimental results show that both RNN and Transformer models outperform the traditional joint sequence model, where the Transformer achieves the best performance. Compared with the joint sequence baseline, the word error rate (WER) of the Transformer for C2T and T2C decreased by 5.72\% and 5.06\% respectively.

下载PDF全文

下载文献需遵守相关版权规定

论文标题