过滤器：一种增强的融合方法，用于跨语性语言理解

论文标题

过滤器：一种增强的融合方法，用于跨语性语言理解

FILTER: An Enhanced Fusion Method for Cross-lingual Language Understanding

论文作者

Fang, Yuwei, Wang, Shuohang, Gan, Zhe, Sun, Siqi, Liu, Jingjing

论文摘要

大规模的跨语言模型（LM），例如Mbert，Unicoder和XLM，在跨语性表示学习方面取得了巨大的成功。但是，当应用于零击的跨语性传输任务时，大多数现有方法仅使用单语言输入来进行LM登录，而无需利用不同语言之间的内在跨语言对齐，这证明对多语言任务必不可少。在本文中，我们提出了过滤器，这是一种增强的融合方法，该方法将跨语性数据作为XLM Finetuning的输入。具体而言，Filter首先用源语言中的文本输入及其在浅层层中独立的目标语言中的翻译进行编码，然后执行跨语言融合以在中间层中提取多语言知识，并最终执行其他语言特定的编码。在推断期间，该模型根据目标语言中的文本输入及其源语言的翻译进行预测。对于简单的任务，例如分类，目标语言中的翻译文本与源语言具有相同的标签。但是，对于更复杂的任务，例如问答，ner和pos标记，此共享标签变得不准确甚至不可用。为了解决这个问题，我们进一步提出了一个基于自动生成的软伪标签，以用于目标语言翻译文本的自动生成的软伪标签，为模型培训提供了额外的KL-Divergence自学损失。广泛的实验表明，滤波器在两个具有挑战性的多语言多任务基准Xtreme和Xglue上实现了新的最新技术。

Large-scale cross-lingual language models (LM), such as mBERT, Unicoder and XLM, have achieved great success in cross-lingual representation learning. However, when applied to zero-shot cross-lingual transfer tasks, most existing methods use only single-language input for LM finetuning, without leveraging the intrinsic cross-lingual alignment between different languages that proves essential for multilingual tasks. In this paper, we propose FILTER, an enhanced fusion method that takes cross-lingual data as input for XLM finetuning. Specifically, FILTER first encodes text input in the source language and its translation in the target language independently in the shallow layers, then performs cross-language fusion to extract multilingual knowledge in the intermediate layers, and finally performs further language-specific encoding. During inference, the model makes predictions based on the text input in the target language and its translation in the source language. For simple tasks such as classification, translated text in the target language shares the same label as the source language. However, this shared label becomes less accurate or even unavailable for more complex tasks such as question answering, NER and POS tagging. To tackle this issue, we further propose an additional KL-divergence self-teaching loss for model training, based on auto-generated soft pseudo-labels for translated text in the target language. Extensive experiments demonstrate that FILTER achieves new state of the art on two challenging multilingual multi-task benchmarks, XTREME and XGLUE.

下载PDF全文

下载文献需遵守相关版权规定

论文标题