论文标题

NLPContribution:一种用于机器阅读自然语言处理文学中学术贡献的注释方案

NLPContributions: An Annotation Scheme for Machine Reading of Scholarly Contributions in Natural Language Processing Literature

论文作者

D'Souza, Jennifer, Auer, Sören

论文摘要

我们描述了一项注释计划,以捕获自然语言处理(NLP)文章中的学术贡献,特别是对于讨论机器学习方法(ML)方法的文章(ML)方法。我们根据50个NLP-ML学术文章的试验注释练习制定了注释任务,为五个信息提取任务贡献了贡献1. Machine Translation,2。命名实体识别,3。问题答案,4。关系分类和5。文本分类。在本文中,我们描述了此试验注释阶段的结果。通过练习,我们获得了注释方法。并找到了十个反映NLP-ML学术研究的贡献的核心信息单位。我们基于这些信息单元制定的结果注释方案称为NLPContribitions。 我们努力的总体目标是四倍:1)找到一组系统的模式,用于学术贡献的语义结构,这些模式或多或少适用于NLP-ML研究文章; 2)将发现的模式应用于创建较大注释的数据集中的研究机器读者的研究贡献; 3)将数据集摄入开放研究知识图(ORKG)基础架构中,作为创建用户友好的最新概述的展示; 4)将机器读取器集成到ORKG中,以帮助用户手动策划其各自的文章贡献。我们设想NLPContribention方法论为该主题提供了更广泛的讨论,以进一步的完善和发展。根据NLPContributions方案,我们的试点注释的50个NLP-ML学术文章的数据集可向研究社区公开,网址为https://doi.org/10.25835/0019761。

We describe an annotation initiative to capture the scholarly contributions in natural language processing (NLP) articles, particularly, for the articles that discuss machine learning (ML) approaches for various information extraction tasks. We develop the annotation task based on a pilot annotation exercise on 50 NLP-ML scholarly articles presenting contributions to five information extraction tasks 1. machine translation, 2. named entity recognition, 3. question answering, 4. relation classification, and 5. text classification. In this article, we describe the outcomes of this pilot annotation phase. Through the exercise we have obtained an annotation methodology; and found ten core information units that reflect the contribution of the NLP-ML scholarly investigations. The resulting annotation scheme we developed based on these information units is called NLPContributions. The overarching goal of our endeavor is four-fold: 1) to find a systematic set of patterns of subject-predicate-object statements for the semantic structuring of scholarly contributions that are more or less generically applicable for NLP-ML research articles; 2) to apply the discovered patterns in the creation of a larger annotated dataset for training machine readers of research contributions; 3) to ingest the dataset into the Open Research Knowledge Graph (ORKG) infrastructure as a showcase for creating user-friendly state-of-the-art overviews; 4) to integrate the machine readers into the ORKG to assist users in the manual curation of their respective article contributions. We envision that the NLPContributions methodology engenders a wider discussion on the topic toward its further refinement and development. Our pilot annotated dataset of 50 NLP-ML scholarly articles according to the NLPContributions scheme is openly available to the research community at https://doi.org/10.25835/0019761.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源