论文标题

减轻神经机器翻译的注意力头的不平等

Alleviating the Inequality of Attention Heads for Neural Machine Translation

论文作者

Sun, Zewei, Huang, Shujian, Dai, Xin-Yu, Chen, Jiajun

论文摘要

最近的研究表明,变压器中的注意力不相等。我们将这种现象与多头注意力的不平衡训练以及对特定头部的模型依赖性联系在一起。为了解决这个问题,我们提出了一种简单的掩蔽方法:戴上两种特定方式。实验表明,翻译改进是在多种语言对上实现的。随后的经验分析也支持我们的假设并确认该方法的有效性。

Recent studies show that the attention heads in Transformer are not equal. We relate this phenomenon to the imbalance training of multi-head attention and the model dependence on specific heads. To tackle this problem, we propose a simple masking method: HeadMask, in two specific ways. Experiments show that translation improvements are achieved on multiple language pairs. Subsequent empirical analyses also support our assumption and confirm the effectiveness of the method.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源