论文标题
为长文本序列调整预验证的文本对文本模型
Adapting Pretrained Text-to-Text Models for Long Text Sequences
论文作者
论文摘要
我们提出了一项实证研究,以适应现有的经过验证的文本对文本模型,以备长期输入。通过沿预处理管道的三个轴的全面研究 - 模型架构,优化目标和训练式语料库,我们提出了一种有效的食谱,以从现有的短篇小说模型中构建长篇小说模型。具体而言,我们会以汇总仪的块关注替换变压器中的全部注意力,并使用蒙版的跨度预测任务为模型提供了跨度不同的跨度。就训练训练的语料库而言,我们发现,与使用通常在其域覆盖范围中通常受到限制的现有长期文档Corpora相比,使用大型开放域语料库的随机串联的短篇小说可以提高性能。通过这些发现,我们构建了一个长篇文本模型,该模型可以在长篇文本QA任务上实现竞争性能,并在五个长文本摘要数据集上建立新的最新技术,通常优于以前的方法,这些方法具有较大的模型大小。我们的代码已在https://github.com/facebookresearch/bart_ls上发布。
We present an empirical study of adapting an existing pretrained text-to-text model for long-sequence inputs. Through a comprehensive study along three axes of the pretraining pipeline -- model architecture, optimization objective, and pretraining corpus, we propose an effective recipe to build long-context models from existing short-context models. Specifically, we replace the full attention in transformers with pooling-augmented blockwise attention, and pretrain the model with a masked-span prediction task with spans of varying length. In terms of the pretraining corpus, we find that using randomly concatenated short-documents from a large open-domain corpus results in better performance than using existing long document corpora which are typically limited in their domain coverage. With these findings, we build a long-context model that achieves competitive performance on long-text QA tasks and establishes the new state of the art on five long-text summarization datasets, often outperforming previous methods with larger model sizes. Our code has been released at https://github.com/facebookresearch/bart_ls.