论文标题
逃避平庸的陷阱:通过集中注意力来促进文本生成的多样性和新颖性
Evade the Trap of Mediocrity: Promoting Diversity and Novelty in Text Generation via Concentrating Attention
论文作者
论文摘要
最近,强大的变压器体系结构在产生高质量句子方面已被证明优越。然而,这些模型倾向于产生沉闷的高频短语,严重伤害了产生的文本的多样性和新颖性。在这项工作中,我们深入研究了这个问题的内在机制,发现变压器中的稀疏注意值可以改善多样性。为了理解这种现象,我们首先进行经验和理论分析,然后将其归因于训练过程中隐藏状态的细心混合物引起的表示变性。我们将这个过程称为平庸的陷阱。为了摆脱这种陷阱,我们引入了新的注意力正规化损失,以控制注意力分布的清晰度,这是对模型结构透明的,并且可以在20行Python代码内轻松实现。我们证明,这种方法可以被数学地视为学习后验注意力的贝叶斯近似。实验表明,我们的方法改善了生成的文本的多样性和新颖性,同时在各种条件和无条件生成任务上保持了可比的质量。
Recently, powerful Transformer architectures have proven superior in generating high-quality sentences. Nevertheless, these models tend to produce dull high-frequency phrases, severely hurting the diversity and novelty of generated text. In this work, we dig into the intrinsic mechanism of this problem and found that sparser attention values in Transformer could improve diversity. To understand such a phenomenon, we first conduct both empirical and theoretical analysis and then attribute it to representation degeneration caused by the attentive mixture of the hidden states during training. We term this process the Trap of Mediocrity. To escape from such a trap, we introduce a novel attention regularization loss to control the sharpness of the attention distribution, which is transparent to model structures and can be easily implemented within 20 lines of python code. We prove that this method could be mathematically regarded as learning a Bayesian approximation of posterior attention. Experiments show that our method improved the diversity and novelty of the generated text while maintaining comparable quality on a variety of conditional and unconditional generation tasks.