论文标题

共同形式:重新思考SoftMax的注意力

cosFormer: Rethinking Softmax in Attention

论文作者

Qin, Zhen, Sun, Weixuan, Deng, Hui, Li, Dongxu, Wei, Yunshen, Lv, Baohong, Yan, Junjie, Kong, Lingpeng, Zhong, Yiran

论文摘要

变形金刚在自然语言处理,计算机视觉和音频处理方面取得了巨大的成功。作为其核心组件之一,SoftMax的注意力有助于捕获远程依赖性,但由于二次空间和时间复杂性与序列长度的复杂性,禁止其扩展。通常采用内核方法来通过近似软效法运算符来降低复杂性。然而,由于近似错误,与香草软马克斯的注意相比,它们的表现在不同的任务/语料库中有所不同,并且在不同的任务/语料库中的性能下降。在本文中,我们提出了一种称为共同形式的线性变压器,该线性变压器可以在休闲和跨关注中获得与香草变压器的可比较或更好的精度。 Cosformer基于SoftMax注意的两个关键属性:i)。注意矩阵的非负性; ii)。一种非线性重新加权方案,可以集中注意力矩阵的分布。作为线性替代品,共同形式使用线性操作员和基于余弦的距离重新加权机制实现了这些属性。关于语言建模和文本理解任务的广泛实验证明了我们方法的有效性。我们进一步研究了长序列的方法,并在远程竞技场基准上实现最先进的性能。源代码可在https://github.com/opennlplab/cosformer上找到。

Transformer has shown great successes in natural language processing, computer vision, and audio processing. As one of its core components, the softmax attention helps to capture long-range dependencies yet prohibits its scale-up due to the quadratic space and time complexity to the sequence length. Kernel methods are often adopted to reduce the complexity by approximating the softmax operator. Nevertheless, due to the approximation errors, their performances vary in different tasks/corpus and suffer crucial performance drops when compared with the vanilla softmax attention. In this paper, we propose a linear transformer called cosFormer that can achieve comparable or better accuracy to the vanilla transformer in both casual and cross attentions. cosFormer is based on two key properties of softmax attention: i). non-negativeness of the attention matrix; ii). a non-linear re-weighting scheme that can concentrate the distribution of the attention matrix. As its linear substitute, cosFormer fulfills these properties with a linear operator and a cosine-based distance re-weighting mechanism. Extensive experiments on language modeling and text understanding tasks demonstrate the effectiveness of our method. We further examine our method on long sequences and achieve state-of-the-art performance on the Long-Range Arena benchmark. The source code is available at https://github.com/OpenNLPLab/cosFormer.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源