论文标题
小鸟:提问回答的速度更快,更长的变压器
LittleBird: Efficient Faster & Longer Transformer for Question Answering
论文作者
论文摘要
伯特在各种NLP任务中表现出了很多成功。但是,由于注意力机制,它具有长期输入的限制。 Longformer等和Bigbird解决了这个问题,并有效地解决了二次依赖性问题。但是,我们发现这些模型还不够,并且提出了Littlebird,这是一种基于Bigbird的新型模型,同时保持了速度和记忆足迹的提高,同时保持准确性。特别是,我们基于线性偏见(alibi)的注意,设计了一种更灵活,更有效的位置表示方法。我们还表明,用包装替换大鸟代表的全球信息方法和解开关注更有效。提出的模型即使在短输入中进行了预先培训后,也可以在长期输入中工作,并且可以有效地培训用于短输入的现有预训练的语言模型。对于难以获得大量长文本数据的低资源语言,这是一个重要的好处。结果,我们的实验表明,小鸟在各种语言中都很好地效果很好,可以在回答任务中实现高性能,尤其是在Korquad2.0中,韩国问题回答了长段落的数据集。
BERT has shown a lot of sucess in a wide variety of NLP tasks. But it has a limitation dealing with long inputs due to its attention mechanism. Longformer, ETC and BigBird addressed this issue and effectively solved the quadratic dependency problem. However we find that these models are not sufficient, and propose LittleBird, a novel model based on BigBird with improved speed and memory footprint while maintaining accuracy. In particular, we devise a more flexible and efficient position representation method based on Attention with Linear Biases (ALiBi). We also show that replacing the method of global information represented in the BigBird with pack and unpack attention is more effective. The proposed model can work on long inputs even after being pre-trained on short inputs, and can be trained efficiently reusing existing pre-trained language model for short inputs. This is a significant benefit for low-resource languages where large amounts of long text data are difficult to obtain. As a result, our experiments show that LittleBird works very well in a variety of languages, achieving high performance in question answering tasks, particularly in KorQuAD2.0, Korean Question Answering Dataset for long paragraphs.