用于翻译的数据自适应转移学习：海地和牙买加的案例研究

论文标题

用于翻译的数据自适应转移学习：海地和牙买加的案例研究

Data-adaptive Transfer Learning for Translation: A Case Study in Haitian and Jamaican

论文作者

Robinson, Nathaniel R., Hogan, Cameron J., Fulda, Nancy, Mortensen, David R.

论文摘要

多语言转移技术通常会改善低资源机器翻译（MT）。这些技术中的许多是不考虑数据特征的情况下应用的。我们在海地到英语翻译的背景下显示，转移效率与知识共享语言之间的培训数据和关系数量相关。我们的实验表明，对于某些语言，超出了真实数据的阈值，反向翻译的增强方法是适得其反的，而从足够相关的语言中的跨语言转移则是优选的。我们通过贡献了基于规则的法国人行曲拼字和句法引擎以及一种新颖的语音嵌入方法来补充这一发现。当与多语言技术一起使用时，拼字法转换使对常规方法的统计学显着改善。在非常低的牙买加MT中，用传输语言进行拼字相似的代码转换可产生6.63的BLEU点优势。

Multilingual transfer techniques often improve low-resource machine translation (MT). Many of these techniques are applied without considering data characteristics. We show in the context of Haitian-to-English translation that transfer effectiveness is correlated with amount of training data and relationships between knowledge-sharing languages. Our experiments suggest that for some languages beyond a threshold of authentic data, back-translation augmentation methods are counterproductive, while cross-lingual transfer from a sufficiently related language is preferred. We complement this finding by contributing a rule-based French-Haitian orthographic and syntactic engine and a novel method for phonological embedding. When used with multilingual techniques, orthographic transformation makes statistically significant improvements over conventional methods. And in very low-resource Jamaican MT, code-switching with a transfer language for orthographic resemblance yields a 6.63 BLEU point advantage.

下载PDF全文

下载文献需遵守相关版权规定

论文标题