论文标题
图灵欺骗
The Turing Deception
论文作者
论文摘要
这项研究重新审视了经典的图灵测试,并比较了最近的大型语言模型,例如Chatgpt,以复制人类水平的理解和引人入胜的文本生成能力。两个任务挑战 - 汇总和问答 - 促使ChatGpt从单个文本条目中产生原始内容(98-99%),以及最初由Turing在1950年提出的顺序问题。我们从2019年开始对OpenAI GPT-2输出探测器进行了原始内容和生成的内容,并在2019年开始建立多种生成内容的原始内容,并证明了原始内容和不可定期(98%)。相对于“如何证明它?”的问题,机器欺骗人类法官的问题在这项工作中退缩了。该作品的最初贡献提出了一个指标和简单的语法集,用于了解聊天机器人的写作机制,以评估其可读性和统计清晰度,参与度,交付和整体质量。尽管图灵的原始散文得分至少比机器生成的输出得分至少14%,但算法是否显示图灵真正原始思想的提示(“ lovelace 2.0”测试)的提示仍然无法解决,目前仍无法回答。
This research revisits the classic Turing test and compares recent large language models such as ChatGPT for their abilities to reproduce human-level comprehension and compelling text generation. Two task challenges -- summarization, and question answering -- prompt ChatGPT to produce original content (98-99%) from a single text entry and also sequential questions originally posed by Turing in 1950. We score the original and generated content against the OpenAI GPT-2 Output Detector from 2019, and establish multiple cases where the generated content proves original and undetectable (98%). The question of a machine fooling a human judge recedes in this work relative to the question of "how would one prove it?" The original contribution of the work presents a metric and simple grammatical set for understanding the writing mechanics of chatbots in evaluating their readability and statistical clarity, engagement, delivery, and overall quality. While Turing's original prose scores at least 14% below the machine-generated output, the question of whether an algorithm displays hints of Turing's truly original thoughts (the "Lovelace 2.0" test) remains unanswered and potentially unanswerable for now.