用外行的术语：从科学文本中提取半开放的关系

论文标题

用外行的术语：从科学文本中提取半开放的关系

In Layman's Terms: Semi-Open Relation Extraction from Scientific Texts

论文作者

Kruiper, Ruben, Vincent, Julian F. V., Chen-Burger, Jessica, Desmulliez, Marc P. Y., Konstas, Ioannis

论文摘要

科学文本中的信息提取（IE）可用于指导读者在科学文档中的中心信息。但是，狭窄的IE系统仅提取捕获的信息的一小部分，而开放的IE系统在科学文本中遇到的漫长而复杂的句子上表现不佳。在这项工作中，我们结合了两种系统的输出，以实现半开放的关系提取，这是我们在生物学领域中探索的新任务。首先，我们介绍了重点的开放生物学信息提取（FOBIE）数据集，并使用Fobie培训最先进的狭窄科学IE系统，以提取生物学文本核心的权衡关系和论点。然后，我们同时运行狭窄的IE系统和最先进的开放式IE系统，包括10K开放式科学生物学文本的语料库。我们表明，可以使用狭窄的IE提取物过滤大量（65％）的错误和非信息开放的IE提取。此外，我们表明，保留的提取物对读者的信息频率更大。

Information Extraction (IE) from scientific texts can be used to guide readers to the central information in scientific documents. But narrow IE systems extract only a fraction of the information captured, and Open IE systems do not perform well on the long and complex sentences encountered in scientific texts. In this work we combine the output of both types of systems to achieve Semi-Open Relation Extraction, a new task that we explore in the Biology domain. First, we present the Focused Open Biological Information Extraction (FOBIE) dataset and use FOBIE to train a state-of-the-art narrow scientific IE system to extract trade-off relations and arguments that are central to biology texts. We then run both the narrow IE system and a state-of-the-art Open IE system on a corpus of 10k open-access scientific biological texts. We show that a significant amount (65%) of erroneous and uninformative Open IE extractions can be filtered using narrow IE extractions. Furthermore, we show that the retained extractions are significantly more often informative to a reader.

下载PDF全文

下载文献需遵守相关版权规定

论文标题