审计的语言模型（目前）可以演绎理由吗？

论文标题

审计的语言模型（目前）可以演绎理由吗？

Can Pretrained Language Models (Yet) Reason Deductively?

论文作者

Yuan, Zhangdie, Hu, Songbo, Vulić, Ivan, Korhonen, Anna, Meng, Zaiqiao

论文摘要

通过预读语言模型（PLM）获取事实知识引起了人们的关注，在许多知识密集型任务中表现出了令人鼓舞的表现。他们的良好表现使社区相信这些模型确实具有一定的推理能力，而不仅仅是记住知识。在本文中，我们对PLM的可学习演绎（也称为显式）推理能力进行了全面评估。通过一系列受控实验，我们提出了两个主要发现。（i）PLM不充分地概括学到的逻辑规则，并针对简单的对抗表面形式编辑不一致。（ii）虽然PLM的演绎推理微调确实提高了他们在看不见的知识事实方面的推理方面的表现，但它导致灾难性地忘记了先前学到的知识。我们的主要结果表明，PLM还不能执行可靠的演绎推理，这表明了受控检查的重要性和PLM的推理能力的探测；我们超越了（误导）任务绩效，揭示了PLM远非人类水平的推理功能，即使是简单的演绎任务也是如此。

Acquiring factual knowledge with Pretrained Language Models (PLMs) has attracted increasing attention, showing promising performance in many knowledge-intensive tasks. Their good performance has led the community to believe that the models do possess a modicum of reasoning competence rather than merely memorising the knowledge. In this paper, we conduct a comprehensive evaluation of the learnable deductive (also known as explicit) reasoning capability of PLMs. Through a series of controlled experiments, we posit two main findings. (i) PLMs inadequately generalise learned logic rules and perform inconsistently against simple adversarial surface form edits. (ii) While the deductive reasoning fine-tuning of PLMs does improve their performance on reasoning over unseen knowledge facts, it results in catastrophically forgetting the previously learnt knowledge. Our main results suggest that PLMs cannot yet perform reliable deductive reasoning, demonstrating the importance of controlled examinations and probing of PLMs' reasoning abilities; we reach beyond (misleading) task performance, revealing that PLMs are still far from human-level reasoning capabilities, even for simple deductive tasks.

下载PDF全文

下载文献需遵守相关版权规定

论文标题