关于神经机器翻译中的长尾现象

论文标题

关于神经机器翻译中的长尾现象

On Long-Tailed Phenomena in Neural Machine Translation

论文作者

Raunak, Vikas, Dalmia, Siddharth, Gupta, Vivek, Metze, Florian

论文摘要

最先进的神经机器翻译（NMT）模型在产生低频代币，解决仍然是一个重大挑战。在结构化预测任务的背景下，对长尾现象的分析进一步阻碍了推断期间的搜索复杂性。在这项工作中，我们定量地表征了两个抽象级别的长尾现象，即代币分类和序列产生。我们提出了一种新的损失函数，即抗焦点损失，以通过在训练过程中纳入光束搜索的电感偏见来更好地适应条件文本生成的结构依赖性。我们展示了所提出的技术对许多机器翻译（MT）数据集的功效，表明它会导致跨不同语言对的跨渗透性的显着增长，尤其是在低频单词的产生方面。我们发布了代码以复制结果。

State-of-the-art Neural Machine Translation (NMT) models struggle with generating low-frequency tokens, tackling which remains a major challenge. The analysis of long-tailed phenomena in the context of structured prediction tasks is further hindered by the added complexities of search during inference. In this work, we quantitatively characterize such long-tailed phenomena at two levels of abstraction, namely, token classification and sequence generation. We propose a new loss function, the Anti-Focal loss, to better adapt model training to the structural dependencies of conditional text generation by incorporating the inductive biases of beam search in the training process. We show the efficacy of the proposed technique on a number of Machine Translation (MT) datasets, demonstrating that it leads to significant gains over cross-entropy across different language pairs, especially on the generation of low-frequency words. We have released the code to reproduce our results.

下载PDF全文

下载文献需遵守相关版权规定

论文标题