论文标题
大型语言模型可以自我爆发
Large Language Models Can Self-Improve
论文作者
论文摘要
大型语言模型(LLM)在各种任务中都取得了出色的表现。但是,通过LLM进行微调需要广泛的监督。另一方面,人可以通过自我思考而没有外部投入来提高其推理能力。在这项工作中,我们证明了一个LLM也只能使用未标记的数据集进行自我改善。我们使用预先训练的LLM使用经过三链链接的提示和自持持续性来生成“高信心”理由提出的答案,以解决未标记的问题,并使用这些自我生成的解决方案作为目标输出来微调LLM。我们表明,我们的方法提高了540B参数LLM的一般推理能力(GSM8K的74.4% - > 82.1%,78.2% - > 83.0%的下降,90.0% - 90.0% - > 94.4%,OpenBookQA上的94.4%,63.4% - > 63.4% - > 67.9% - > 67.9%,没有任何基础,而没有任何基础,而没有任何基础。我们进行消融研究,并表明对推理进行微调对于自我完善至关重要。
Large Language Models (LLMs) have achieved excellent performances in various tasks. However, fine-tuning an LLM requires extensive supervision. Human, on the other hand, may improve their reasoning abilities by self-thinking without external inputs. In this work, we demonstrate that an LLM is also capable of self-improving with only unlabeled datasets. We use a pre-trained LLM to generate "high-confidence" rationale-augmented answers for unlabeled questions using Chain-of-Thought prompting and self-consistency, and fine-tune the LLM using those self-generated solutions as target outputs. We show that our approach improves the general reasoning ability of a 540B-parameter LLM (74.4%->82.1% on GSM8K, 78.2%->83.0% on DROP, 90.0%->94.4% on OpenBookQA, and 63.4%->67.9% on ANLI-A3) and achieves state-of-the-art-level performance, without any ground truth label. We conduct ablation studies and show that fine-tuning on reasoning is critical for self-improvement.