思维：使用大语言模型的概率推理过度的推理

论文标题

思维：使用大语言模型的概率推理过度的推理

ThinkSum: Probabilistic reasoning over sets using large language models

论文作者

Ozturkler, Batu, Malkin, Nikolay, Wang, Zhen, Jojic, Nebojsa

论文摘要

大型语言模型（LLMS）具有高级类似推理的实质性：在其训练数据（零摄影评估）或提供的上下文中发生的线性文本中的复制模式（几乎没有摄影）。但是，最近的研究表明，即使是更高级的LLM在需要对多个对象或事实进行推理并制定逻辑推论序列的场景中失败。我们提出了一个两阶段的概率推理范式，即思考，这是以结构化的方式跨越对象或事实的原因。在第一阶段（思考 - 关联检索），LLM在从提示或辅助模型调用中提取的一组短语中并行查询。在第二阶段（总和 - 概率的推论或推理），这些查询的结果汇总为最终预测。我们证明了在LLM评估任务的大基础套件上，思维套件的可能性和优势，使用GPT家庭模型在十三个困难任务上实现了对最新技术的改进，通常具有较小的模型变体。我们还将思想和对比度与其他提议的修改进行了比较，并将其与直接提示LLM的提示进行了比较，例如Thebough Thought提示的变体。我们的结果表明，由于思维概念的概率推论是在对LLM的呼吁之外进行的，因此Thinkum对及时设计的敏感性较差，可以产生更容易解释的预测，并且可以与潜在变量模型灵活地结合以从LLM中提取结构化知识。总体而言，我们提出的范式代表了增强LLM的推理能力的一种有希望的方法。

Large language models (LLMs) have a substantial capacity for high-level analogical reasoning: reproducing patterns in linear text that occur in their training data (zero-shot evaluation) or in the provided context (few-shot in-context learning). However, recent studies show that even the more advanced LLMs fail in scenarios that require reasoning over multiple objects or facts and making sequences of logical deductions. We propose a two-stage probabilistic inference paradigm, ThinkSum, which reasons over sets of objects or facts in a structured manner. In the first stage (Think - retrieval of associations), a LLM is queried in parallel over a set of phrases extracted from the prompt or an auxiliary model call. In the second stage (Sum - probabilistic inference or reasoning), the results of these queries are aggregated to make the final prediction. We demonstrate the possibilities and advantages of ThinkSum on the BIG-bench suite of LLM evaluation tasks, achieving improvements over the state of the art using GPT-family models on thirteen difficult tasks, often with far smaller model variants. We also compare and contrast ThinkSum with other proposed modifications to direct prompting of LLMs, such as variants of chain-of-thought prompting. Our results suggest that because the probabilistic inference in ThinkSum is performed outside of calls to the LLM, ThinkSum is less sensitive to prompt design, yields more interpretable predictions, and can be flexibly combined with latent variable models to extract structured knowledge from LLMs. Overall, our proposed paradigm represents a promising approach for enhancing the reasoning capabilities of LLMs.

下载PDF全文

下载文献需遵守相关版权规定

论文标题