论文标题
土耳其测试:语言模型可以理解说明吗?
The Turking Test: Can Language Models Understand Instructions?
论文作者
论文摘要
监督的机器学习为学习者提供了目标任务的一组输入输出示例。但是,人类还可以学会从自然语言的说明中执行新任务。机器也可以学会理解说明吗?我们提出了土耳其测试,该测试检查了模型遵循自然语言的复杂性指示的能力。这些范围从简单的任务(例如检索句子的第n个单词)到需要创造力的任务,例如为SNLI和小队生成示例代替人类情报工作者(“ Turkers”)。尽管我们的评估方法宽松,但我们观察到,在所有任务中,较大的经过验证的语言模型的表现较差。分析模型的错误模式表明,该模型倾向于忽略明确的指令,并且通常会生成无法解释为解决任务的输出。虽然尚不清楚传统语言模型是否可以捕获教学理解,但教学理解的纯粹表现使它成为了不断上升的少数推理范式的吸引人的替代方法。
Supervised machine learning provides the learner with a set of input-output examples of the target task. Humans, however, can also learn to perform new tasks from instructions in natural language. Can machines learn to understand instructions as well? We present the Turking Test, which examines a model's ability to follow natural language instructions of varying complexity. These range from simple tasks, like retrieving the nth word of a sentence, to ones that require creativity, such as generating examples for SNLI and SQuAD in place of human intelligence workers ("turkers"). Despite our lenient evaluation methodology, we observe that a large pretrained language model performs poorly across all tasks. Analyzing the model's error patterns reveals that the model tends to ignore explicit instructions and often generates outputs that cannot be construed as an attempt to solve the task. While it is not yet clear whether instruction understanding can be captured by traditional language models, the sheer expressivity of instruction understanding makes it an appealing alternative to the rising few-shot inference paradigm.