Tinygenius：与微型众包进行学术知识图创建的自然语言处理

论文标题

Tinygenius：与微型众包进行学术知识图创建的自然语言处理

TinyGenius: Intertwining Natural Language Processing with Microtask Crowdsourcing for Scholarly Knowledge Graph Creation

论文作者

Oelen, Allard, Stocker, Markus, Auer, Sören

论文摘要

随着每年发表的学术文章的数量稳步增长，需要新的方法来组织学术知识，以便可以更有效地发现和使用它。自然语言处理（NLP）技术能够自主对学术文章进行自主处理，并创建文章内容的机器可读表示。但是，自主NLP方法到目前为止还不够准确地创建高质量的知识图。然而，质量对于图表在实践中很有用至关重要。我们提出了TinyGenius，这是一种使用众包进行的微型验证的方法来验证NLP提取的学术知识陈述。人群工人经营的学术背景面临着多个挑战。所采用的NLP方法的解释性对于提供背景以支持人群工人的决策过程至关重要。我们使用五种不同的NLP方法使用TinyGenius来填充以纸张为中心的知识图。最后，由此产生的知识图是学术文章的数字图书馆。

As the number of published scholarly articles grows steadily each year, new methods are needed to organize scholarly knowledge so that it can be more efficiently discovered and used. Natural Language Processing (NLP) techniques are able to autonomously process scholarly articles at scale and to create machine readable representations of the article content. However, autonomous NLP methods are by far not sufficiently accurate to create a high-quality knowledge graph. Yet quality is crucial for the graph to be useful in practice. We present TinyGenius, a methodology to validate NLP-extracted scholarly knowledge statements using microtasks performed with crowdsourcing. The scholarly context in which the crowd workers operate has multiple challenges. The explainability of the employed NLP methods is crucial to provide context in order to support the decision process of crowd workers. We employed TinyGenius to populate a paper-centric knowledge graph, using five distinct NLP methods. In the end, the resulting knowledge graph serves as a digital library for scholarly articles.

下载PDF全文

下载文献需遵守相关版权规定

论文标题