使用短文本模型有效的长文本理解

论文标题

使用短文本模型有效的长文本理解

Efficient Long-Text Understanding with Short-Text Models

论文作者

Ivgi, Maor, Shaham, Uri, Berant, Jonathan

论文摘要

基于变压器的预审前的语言模型（LMS）在自然语言的理解中无处不在，但由于其二次复杂性，无法应用于故事，科学文章和长文档等长序列。尽管已经提出了无数有效的变压器变体，但它们通常是基于需要从头开始的定制实现的。在这项工作中，我们提出了雪橇：滑动编码器和解码器，这是一种处理长序列的简单方法，可以重新使用和利用战斗测试的短文本预处理的LMS。具体来说，我们将输入分配到重叠的块中，用一个短文本LM编码器编码每个块，然后使用预审计的解码器将信息融合到跨块（Fusion-In-In-In-indecoder）之间。我们通过受控实验说明，雪橇为长期理解和评估我们在卷轴上的方法提供了可行的策略，这是一个基准，这是一个在各种语言理解任务中具有七个数据集的基准。我们发现，雪橇具有竞争力，具有高达50倍且需要专用且昂贵的预处理步骤的专业型号。

Transformer-based pretrained language models (LMs) are ubiquitous across natural language understanding, but cannot be applied to long sequences such as stories, scientific articles and long documents, due to their quadratic complexity. While a myriad of efficient transformer variants have been proposed, they are typically based on custom implementations that require expensive pretraining from scratch. In this work, we propose SLED: SLiding-Encoder and Decoder, a simple approach for processing long sequences that re-uses and leverages battle-tested short-text pretrained LMs. Specifically, we partition the input into overlapping chunks, encode each with a short-text LM encoder and use the pretrained decoder to fuse information across chunks (fusion-in-decoder). We illustrate through controlled experiments that SLED offers a viable strategy for long text understanding and evaluate our approach on SCROLLS, a benchmark with seven datasets across a wide range of language understanding tasks. We find that SLED is competitive with specialized models that are up to 50x larger and require a dedicated and expensive pretraining step.

下载PDF全文

下载文献需遵守相关版权规定

论文标题