论文标题

具有差分分析和解释的通用时间推理

Generic Temporal Reasoning with Differential Analysis and Explanation

论文作者

Feng, Yu, Zhou, Ben, Wang, Haoyu, Jin, Helen, Roth, Dan

论文摘要

时间推理是预测事件对的时间关系的任务。虽然时间推理模型在内域基准测试中的表现可以很好地发挥作用,但由于现有数据集的限制,我们对这些系统的可推广性一无所知。在这项工作中,我们介绍了一个名为“今天”的新任务,该任务弥合了时间差分分析,顾名思义,该差距评估系统是否可以正确理解增量变化的效果。具体而言,今天为给定的事件对引入了轻微的上下文更改,并要求系统告诉这种微妙的上下文变化将如何影响相关的时间关系分布。为了促进学习,今天还注释了人类的解释。我们表明,包括GPT-3.5在内的现有模型落在今天的随机猜测中,这表明它们在很大程度上依赖虚假信息,而不是适当的时间预测。另一方面,我们表明,今天的监督风格和解释注释可用于联合学习,鼓励模型在培训期间使用更合适的信号,从而在几个基准中都表现优于表现。如今,也可以用于训练模型,以征求GPT-3.5等嘈杂来源的偶然监督,从而使我们更多地朝着通用的时间推理系统的目标迈进。

Temporal reasoning is the task of predicting temporal relations of event pairs. While temporal reasoning models can perform reasonably well on in-domain benchmarks, we have little idea of these systems' generalizability due to existing datasets' limitations. In this work, we introduce a novel task named TODAY that bridges this gap with temporal differential analysis, which as the name suggests, evaluates whether systems can correctly understand the effect of incremental changes. Specifically, TODAY introduces slight contextual changes for given event pairs, and systems are asked to tell how this subtle contextual change would affect relevant temporal relation distributions. To facilitate learning, TODAY also annotates human explanations. We show that existing models, including GPT-3.5, drop to random guessing on TODAY, suggesting that they heavily rely on spurious information rather than proper reasoning for temporal predictions. On the other hand, we show that TODAY's supervision style and explanation annotations can be used in joint learning, encouraging models to use more appropriate signals during training and thus outperform across several benchmarks. TODAY can also be used to train models to solicit incidental supervision from noisy sources such as GPT-3.5, thus moving us more toward the goal of generic temporal reasoning systems.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源