论文标题
基于自我注意力的元装置
Meta-Embeddings Based On Self-Attention
论文作者
论文摘要
创建元嵌入式以提高语言建模的表现,最近引起了人们的关注,基于串联或仅计算多个单独训练的嵌入量的算术平均值以执行元嵌入的算术平均值已证明是有益的。在本文中,我们设计了一种基于自我注意的机制的新元装置模型,即二人组。二重奏机制在文本分类任务(例如20nng)中达到了最先进的准确性。此外,我们为机器翻译提出了一个新的元装置序列序列模型,据我们所知,这是基于多个单词插件的第一个机器翻译模型。此外,事实证明,我们的模型不仅在获得更好的结果方面都优于变压器,而且在公认的基准上也更快地收敛,例如WMT 2014英语对french-French翻译任务。
Creating meta-embeddings for better performance in language modelling has received attention lately, and methods based on concatenation or merely calculating the arithmetic mean of more than one separately trained embeddings to perform meta-embeddings have shown to be beneficial. In this paper, we devise a new meta-embedding model based on the self-attention mechanism, namely the Duo. With less than 0.4M parameters, the Duo mechanism achieves state-of-the-art accuracy in text classification tasks such as 20NG. Additionally, we propose a new meta-embedding sequece-to-sequence model for machine translation, which to the best of our knowledge, is the first machine translation model based on more than one word-embedding. Furthermore, it has turned out that our model outperform the Transformer not only in terms of achieving a better result, but also a faster convergence on recognized benchmarks, such as the WMT 2014 English-to-French translation task.