论文标题
看到森林和树木:在不同构图水平上共同分类的多头关注
Seeing Both the Forest and the Trees: Multi-head Attention for Joint Classification on Different Compositional Levels
论文作者
论文摘要
在自然语言中,单词被关联构建句子。这不是孤立的词,而是层次结构的适当组合,它传达了整个句子的含义。神经网络可以捕获表现力的语言特征;但是,很难自动获取单词和句子之间链接的见解。在这项工作中,我们设计了一个深层的神经网络体系结构,该架构明确地呈现较低和更高的语言组成部分。然后,我们评估其在不同层次级别执行相同任务的能力。我们表明,在广泛的文本分类任务上,我们的模型MHAL学会了通过在层次结构之间流畅地传输知识来同时解决它们的粒度不同。 MHAL使用多头注意机制将表示形式绑定到单个单词和完整句子之间,MHAL系统地超过了与不相同的模型,这些模型并未激励开发组成表示。此外,我们证明,借助提出的架构,句子信息自然而然地流向单个单词,从而使模型像序列标记器一样(这是一个较低的,单词级的任务),即使没有任何单词监督,以零声的方式进行。
In natural languages, words are used in association to construct sentences. It is not words in isolation, but the appropriate combination of hierarchical structures that conveys the meaning of the whole sentence. Neural networks can capture expressive language features; however, insights into the link between words and sentences are difficult to acquire automatically. In this work, we design a deep neural network architecture that explicitly wires lower and higher linguistic components; we then evaluate its ability to perform the same task at different hierarchical levels. Settling on broad text classification tasks, we show that our model, MHAL, learns to simultaneously solve them at different levels of granularity by fluidly transferring knowledge between hierarchies. Using a multi-head attention mechanism to tie the representations between single words and full sentences, MHAL systematically outperforms equivalent models that are not incentivized towards developing compositional representations. Moreover, we demonstrate that, with the proposed architecture, the sentence information flows naturally to individual words, allowing the model to behave like a sequence labeller (which is a lower, word-level task) even without any word supervision, in a zero-shot fashion.