语言模型是否具有日常事物的连贯心理模型？

论文标题

语言模型是否具有日常事物的连贯心理模型？

Do language models have coherent mental models of everyday things?

论文作者

Gu, Yuling, Mishra, Bhavana Dalvi, Clark, Peter

论文摘要

当人们想到鸡蛋之类的日常事物时，他们通常会有与之相关的心理形象。例如，这使他们能够正确判断“蛋黄围绕着壳”是一个错误的陈述。语言模型是否同样对这种日常事物有一致的图片？为了调查这一点，我们提出了一个由100件日常事物，它们的部分以及这些部分之间的关系组成的基准数据集，其表示为11,720“ x关系y？”是/错误的问题。使用这些问题作为探针，我们观察到，诸如GPT-3和Macaw之类的最先进的预训练的语言模型（LMS）具有有关这些日常事物的知识片段，但没有完全连贯的“零件心理模型”（准确54-59％，19-43％的条件约束违规）。我们提出了一个扩展，其中我们在LM的原始预测之上添加了一个约束满意度层以应用常识约束。除了消除不一致之处，我们发现这也显着提高了准确性（提高16-20％），这表明LM日常事物图片的不连贯性如何显着降低。

When people think of everyday things like an egg, they typically have a mental image associated with it. This allows them to correctly judge, for example, that "the yolk surrounds the shell" is a false statement. Do language models similarly have a coherent picture of such everyday things? To investigate this, we propose a benchmark dataset consisting of 100 everyday things, their parts, and the relationships between these parts, expressed as 11,720 "X relation Y?" true/false questions. Using these questions as probes, we observe that state-of-the-art pre-trained language models (LMs) like GPT-3 and Macaw have fragments of knowledge about these everyday things, but do not have fully coherent "parts mental models" (54-59% accurate, 19-43% conditional constraint violation). We propose an extension where we add a constraint satisfaction layer on top of the LM's raw predictions to apply commonsense constraints. As well as removing inconsistencies, we find that this also significantly improves accuracy (by 16-20%), suggesting how the incoherence of the LM's pictures of everyday things can be significantly reduced.

下载PDF全文

下载文献需遵守相关版权规定

论文标题