论文标题
语言模型是否具有日常事物的连贯心理模型?
Do language models have coherent mental models of everyday things?
论文作者
论文摘要
当人们想到鸡蛋之类的日常事物时,他们通常会有与之相关的心理形象。例如,这使他们能够正确判断“蛋黄围绕着壳”是一个错误的陈述。语言模型是否同样对这种日常事物有一致的图片?为了调查这一点,我们提出了一个由100件日常事物,它们的部分以及这些部分之间的关系组成的基准数据集,其表示为11,720“ x关系y?”是/错误的问题。使用这些问题作为探针,我们观察到,诸如GPT-3和Macaw之类的最先进的预训练的语言模型(LMS)具有有关这些日常事物的知识片段,但没有完全连贯的“零件心理模型”(准确54-59%,19-43%的条件约束违规)。我们提出了一个扩展,其中我们在LM的原始预测之上添加了一个约束满意度层以应用常识约束。除了消除不一致之处,我们发现这也显着提高了准确性(提高16-20%),这表明LM日常事物图片的不连贯性如何显着降低。
When people think of everyday things like an egg, they typically have a mental image associated with it. This allows them to correctly judge, for example, that "the yolk surrounds the shell" is a false statement. Do language models similarly have a coherent picture of such everyday things? To investigate this, we propose a benchmark dataset consisting of 100 everyday things, their parts, and the relationships between these parts, expressed as 11,720 "X relation Y?" true/false questions. Using these questions as probes, we observe that state-of-the-art pre-trained language models (LMs) like GPT-3 and Macaw have fragments of knowledge about these everyday things, but do not have fully coherent "parts mental models" (54-59% accurate, 19-43% conditional constraint violation). We propose an extension where we add a constraint satisfaction layer on top of the LM's raw predictions to apply commonsense constraints. As well as removing inconsistencies, we find that this also significantly improves accuracy (by 16-20%), suggesting how the incoherence of the LM's pictures of everyday things can be significantly reduced.