NLPEER：同行评审计算研究的统一资源

论文标题

NLPEER：同行评审计算研究的统一资源

NLPeer: A Unified Resource for the Computational Study of Peer Review

论文作者

Dycke, Nils, Kuznetsov, Ilia, Gurevych, Iryna

论文摘要

同行评审构成学术出版的核心组成部分；然而，它需要大量的专业知识和培训，并且容易受到错误和偏见的影响。 NLP在同行审查辅助方面的各种应用程序旨在支持审阅者在这个复杂的过程中，但是缺乏明确的许可数据集和多域Corpora，可以阻止对NLP进行系统研究以进行同行评审。为了解决这一问题，我们介绍了NLPEER-第一个由5K纸和11K评论报告的道德采购的多域语料库，来自五个不同的场所。除了NLP社区的纸张草稿，可与摄像头的版本和同行评论的新数据集外，我们还建立了一个统一的数据表示形式，并增强了先前的同行评审数据集，其中包括解析和结构化的纸张表示，丰富的元数据和版本化信息。我们通过实施和分析三项审查援助任务，包括一项新颖的指导掠夺任务，对我们的资源进行补充。我们的工作铺平了通往NLP及其他地区对同行评审的系统性，多方面，基于证据的研究的道路。数据和代码公开可用。

Peer review constitutes a core component of scholarly publishing; yet it demands substantial expertise and training, and is susceptible to errors and biases. Various applications of NLP for peer reviewing assistance aim to support reviewers in this complex process, but the lack of clearly licensed datasets and multi-domain corpora prevent the systematic study of NLP for peer review. To remedy this, we introduce NLPeer -- the first ethically sourced multidomain corpus of more than 5k papers and 11k review reports from five different venues. In addition to the new datasets of paper drafts, camera-ready versions and peer reviews from the NLP community, we establish a unified data representation and augment previous peer review datasets to include parsed and structured paper representations, rich metadata and versioning information. We complement our resource with implementations and analysis of three reviewing assistance tasks, including a novel guided skimming task. Our work paves the path towards systematic, multi-faceted, evidence-based study of peer review in NLP and beyond. The data and code are publicly available.

下载PDF全文

下载文献需遵守相关版权规定

论文标题