论文标题

Maven-ere:统一的大型数据集,用于事件核心,时间,因果关系和下属关系提取

MAVEN-ERE: A Unified Large-scale Dataset for Event Coreference, Temporal, Causal, and Subevent Relation Extraction

论文作者

Wang, Xiaozhi, Chen, Yulin, Ding, Ning, Peng, Hao, Wang, Zimu, Lin, Yankai, Han, Xu, Hou, Lei, Li, Juanzi, Liu, Zhiyuan, Li, Peng, Zhou, Jie

论文摘要

现实世界中事件(包括核心,时间,因果关系和亚事件关系)之间的多种关系是理解自然语言的基础。但是,现有数据集的两个缺点限制事件关系提取(ERE)任务:(1)小规模。由于注释复杂性,现有数据集的数据比例有限,该数据比例无法很好地训练和评估渴望数据的模型。 (2)没有统一注释。不同类型的事件关系自然相互交互,但是现有数据集仅一次涵盖有限的关系类型,这阻止了模型充分利用关系交互的优势。为了解决这些问题,我们构建了一个统一的大规模人类注销的ERE数据集Maven-ere,并具有改进的注释方案。它包含103,193个事件核心链,1,216,217个临时关系,57,992个因果关系和15,841个亚事件关系,该关系比所有ERE任务的现有数据集都大于所有ERE任务的数据集。实验表明,在Maven-ere上的ERE非常具有挑战性,并且考虑与联合学习的关系相互作用可以改善性能。数据集和源代码可以从https://github.com/thu-keg/maven-ere获得。

The diverse relationships among real-world events, including coreference, temporal, causal, and subevent relations, are fundamental to understanding natural languages. However, two drawbacks of existing datasets limit event relation extraction (ERE) tasks: (1) Small scale. Due to the annotation complexity, the data scale of existing datasets is limited, which cannot well train and evaluate data-hungry models. (2) Absence of unified annotation. Different types of event relations naturally interact with each other, but existing datasets only cover limited relation types at once, which prevents models from taking full advantage of relation interactions. To address these issues, we construct a unified large-scale human-annotated ERE dataset MAVEN-ERE with improved annotation schemes. It contains 103,193 event coreference chains, 1,216,217 temporal relations, 57,992 causal relations, and 15,841 subevent relations, which is larger than existing datasets of all the ERE tasks by at least an order of magnitude. Experiments show that ERE on MAVEN-ERE is quite challenging, and considering relation interactions with joint learning can improve performances. The dataset and source codes can be obtained from https://github.com/THU-KEG/MAVEN-ERE.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源