为长文本序列调整预验证的文本对文本模型

论文标题

为长文本序列调整预验证的文本对文本模型

Adapting Pretrained Text-to-Text Models for Long Text Sequences

论文作者

Xiong, Wenhan, Gupta, Anchit, Toshniwal, Shubham, Mehdad, Yashar, Yih, Wen-tau

论文摘要

我们提出了一项实证研究，以适应现有的经过验证的文本对文本模型，以备长期输入。通过沿预处理管道的三个轴的全面研究 - 模型架构，优化目标和训练式语料库，我们提出了一种有效的食谱，以从现有的短篇小说模型中构建长篇小说模型。具体而言，我们会以汇总仪的块关注替换变压器中的全部注意力，并使用蒙版的跨度预测任务为模型提供了跨度不同的跨度。就训练训练的语料库而言，我们发现，与使用通常在其域覆盖范围中通常受到限制的现有长期文档Corpora相比，使用大型开放域语料库的随机串联的短篇小说可以提高性能。通过这些发现，我们构建了一个长篇文本模型，该模型可以在长篇文本QA任务上实现竞争性能，并在五个长文本摘要数据集上建立新的最新技术，通常优于以前的方法，这些方法具有较大的模型大小。我们的代码已在https://github.com/facebookresearch/bart_ls上发布。

We present an empirical study of adapting an existing pretrained text-to-text model for long-sequence inputs. Through a comprehensive study along three axes of the pretraining pipeline -- model architecture, optimization objective, and pretraining corpus, we propose an effective recipe to build long-context models from existing short-context models. Specifically, we replace the full attention in transformers with pooling-augmented blockwise attention, and pretrain the model with a masked-span prediction task with spans of varying length. In terms of the pretraining corpus, we find that using randomly concatenated short-documents from a large open-domain corpus results in better performance than using existing long document corpora which are typically limited in their domain coverage. With these findings, we build a long-context model that achieves competitive performance on long-text QA tasks and establishes the new state of the art on five long-text summarization datasets, often outperforming previous methods with larger model sizes. Our code has been released at https://github.com/facebookresearch/bart_ls.

下载PDF全文

下载文献需遵守相关版权规定

论文标题