论文标题
为什么自我注意力自然对于序列到序列问题是自然的?对称的观点
Why self-attention is Natural for Sequence-to-Sequence Problems? A Perspective from Symmetries
论文作者
论文摘要
在本文中,我们表明,类似于自我注意力的结构自然是从对称的角度学习许多顺序到序列问题。受语言处理应用程序的启发,我们研究了具有知识的Seq2Seq函数的正交均值,这些功能是函数,即采用两个输入 - 输入序列和``知识'' - 并输出另一个序列。知识由与输入序列相同的嵌入空间中的一组向量组成,其中包含用于处理输入序列的语言的信息。我们表明,嵌入空间中的正交性是自然的,对于具有知识的SEQ2SEQ函数,在这种均衡时期,该功能必须将形式接近自我注意力。这表明,与自我注意的类似的网络结构是代表许多SEQ2SEQ问题的目标功能的正确结构。如果考虑``有限信息原理'',或者对输入序列元素的列表符合符合度的符合性,则可以进一步完善表示形式。
In this paper, we show that structures similar to self-attention are natural to learn many sequence-to-sequence problems from the perspective of symmetry. Inspired by language processing applications, we study the orthogonal equivariance of seq2seq functions with knowledge, which are functions taking two inputs -- an input sequence and a ``knowledge'' -- and outputting another sequence. The knowledge consists of a set of vectors in the same embedding space as the input sequence, containing the information of the language used to process the input sequence. We show that orthogonal equivariance in the embedding space is natural for seq2seq functions with knowledge, and under such equivariance the function must take the form close to the self-attention. This shows that network structures similar to self-attention are the right structures to represent the target function of many seq2seq problems. The representation can be further refined if a ``finite information principle'' is considered, or a permutation equivariance holds for the elements of the input sequence.