将变压器应用于字符级的转导

论文标题

将变压器应用于字符级的转导

Applying the Transformer to Character-level Transduction

论文作者

Wu, Shijie, Cotterell, Ryan, Hulden, Mans

论文摘要

在各种单词级NLP任务中，变压器已显示出优于基于神经网络的序列序列模型。然而，对于字符级的转导任务，例如形态学拐点产生和历史文本归一化，很少有使用变压器胜过复发模型的作品。在一项实证研究中，我们发现，与经常性的序列到序列模型相比，批量大小在变压器在角色级任务上的性能中起着至关重要的作用，并且我们表明，具有足够大的批量大小，变压器确实胜过了较大的复发模型。我们还引入了一种简单的技术来处理特征引导的角色级传输，从而进一步提高了性能。有了这些见解，我们就形态变化和历史文本归一化实现了最新的表现。我们还表明，变压器在其他两个角色级转导任务上的表现优于强大的基线：谱系至phoneme转换和音译。

The transformer has been shown to outperform recurrent neural network-based sequence-to-sequence models in various word-level NLP tasks. Yet for character-level transduction tasks, e.g. morphological inflection generation and historical text normalization, there are few works that outperform recurrent models using the transformer. In an empirical study, we uncover that, in contrast to recurrent sequence-to-sequence models, the batch size plays a crucial role in the performance of the transformer on character-level tasks, and we show that with a large enough batch size, the transformer does indeed outperform recurrent models. We also introduce a simple technique to handle feature-guided character-level transduction that further improves performance. With these insights, we achieve state-of-the-art performance on morphological inflection and historical text normalization. We also show that the transformer outperforms a strong baseline on two other character-level transduction tasks: grapheme-to-phoneme conversion and transliteration.

下载PDF全文

下载文献需遵守相关版权规定

论文标题