论文标题
什么是新的?总结科学文学的贡献
What's New? Summarizing Contributions in Scientific Literature
论文作者
论文摘要
每天分享成千上万的学术文章,跟上最新的科学发现变得越来越困难。为了克服这个问题,我们引入了一项新的纸质纸张摘要的任务,该任务旨在为纸质贡献和工作背景生成单独的摘要,从而更容易识别文章中共享的关键发现。为此,我们扩展了学术文章的S2orc语料库,该文章涵盖了从经济学到心理学的各种领域,通过添加分散的“贡献”和“上下文”参考标签。我们与数据集一起介绍和分析了三种基线方法:1)由输入代码前缀控制的统一模型,2)一个具有独立生成头的模型,专门用于生成分离的输出,3)一种训练策略,该培训策略使用来自内侧和外部引用的其他监督来指导该模型。我们还提出了一项全面的自动评估协议,该协议报告了生成的产出的相关性,新颖性和分离。通过涉及专家注释者的人类研究,我们表明,在79%的情况下,我们的新任务被认为比传统的科学论文摘要更有帮助。
With thousands of academic articles shared on a daily basis, it has become increasingly difficult to keep up with the latest scientific findings. To overcome this problem, we introduce a new task of disentangled paper summarization, which seeks to generate separate summaries for the paper contributions and the context of the work, making it easier to identify the key findings shared in articles. For this purpose, we extend the S2ORC corpus of academic articles, which spans a diverse set of domains ranging from economics to psychology, by adding disentangled "contribution" and "context" reference labels. Together with the dataset, we introduce and analyze three baseline approaches: 1) a unified model controlled by input code prefixes, 2) a model with separate generation heads specialized in generating the disentangled outputs, and 3) a training strategy that guides the model using additional supervision coming from inbound and outbound citations. We also propose a comprehensive automatic evaluation protocol which reports the relevance, novelty, and disentanglement of generated outputs. Through a human study involving expert annotators, we show that in 79%, of cases our new task is considered more helpful than traditional scientific paper summarization.