使用大型语言模型模拟多个人类并复制人类学科研究

论文标题

使用大型语言模型模拟多个人类并复制人类学科研究

Using Large Language Models to Simulate Multiple Humans and Replicate Human Subject Studies

论文作者

Aher, Gati, Arriaga, Rosa I., Kalai, Adam Tauman

论文摘要

我们引入了一种称为图灵实验（TE）的新型测试，用于评估给定语言模型（例如GPT模型）在多大程度上可以模拟人类行为的不同方面。 TE还可以揭示语言模型对特定人类行为的模拟中的一致扭曲。与图灵测试涉及模拟单个任意个体不同，te需要模拟人类学科研究中参与者的代表性样本。我们执行试图从先前研究中复制良好发现的TE。我们设计了一种模拟TES的方法，并说明了它的用途，以比较不同语言模型能够重现经典的经济，心理语言和社会心理学实验：最后通atum游戏，花园路径句子，米尔格拉姆冲击实验和人群的智慧。在前三个TE中，现有的发现是使用最近模型复制的，而最后的TE揭示了某些语言模型（包括Chatgpt和GPT-4）中存在的“高临界畸变”，这可能会影响教育和艺术中的下游应用。

We introduce a new type of test, called a Turing Experiment (TE), for evaluating to what extent a given language model, such as GPT models, can simulate different aspects of human behavior. A TE can also reveal consistent distortions in a language model's simulation of a specific human behavior. Unlike the Turing Test, which involves simulating a single arbitrary individual, a TE requires simulating a representative sample of participants in human subject research. We carry out TEs that attempt to replicate well-established findings from prior studies. We design a methodology for simulating TEs and illustrate its use to compare how well different language models are able to reproduce classic economic, psycholinguistic, and social psychology experiments: Ultimatum Game, Garden Path Sentences, Milgram Shock Experiment, and Wisdom of Crowds. In the first three TEs, the existing findings were replicated using recent models, while the last TE reveals a "hyper-accuracy distortion" present in some language models (including ChatGPT and GPT-4), which could affect downstream applications in education and the arts.

下载PDF全文

下载文献需遵守相关版权规定

论文标题