大型语言模型可以自我爆发

论文标题

大型语言模型可以自我爆发

Large Language Models Can Self-Improve

论文作者

Huang, Jiaxin, Gu, Shixiang Shane, Hou, Le, Wu, Yuexin, Wang, Xuezhi, Yu, Hongkun, Han, Jiawei

论文摘要

大型语言模型（LLM）在各种任务中都取得了出色的表现。但是，通过LLM进行微调需要广泛的监督。另一方面，人可以通过自我思考而没有外部投入来提高其推理能力。在这项工作中，我们证明了一个LLM也只能使用未标记的数据集进行自我改善。我们使用预先训练的LLM使用经过三链链接的提示和自持持续性来生成“高信心”理由提出的答案，以解决未标记的问题，并使用这些自我生成的解决方案作为目标输出来微调LLM。我们表明，我们的方法提高了540B参数LLM的一般推理能力（GSM8K的74.4％ - > 82.1％，78.2％ - > 83.0％的下降，90.0％ - 90.0％ - > 94.4％，OpenBookQA上的94.4％，63.4％ - > 63.4％ - > 67.9％ - > 67.9％，没有任何基础，而没有任何基础，而没有任何基础。我们进行消融研究，并表明对推理进行微调对于自我完善至关重要。

Large Language Models (LLMs) have achieved excellent performances in various tasks. However, fine-tuning an LLM requires extensive supervision. Human, on the other hand, may improve their reasoning abilities by self-thinking without external inputs. In this work, we demonstrate that an LLM is also capable of self-improving with only unlabeled datasets. We use a pre-trained LLM to generate "high-confidence" rationale-augmented answers for unlabeled questions using Chain-of-Thought prompting and self-consistency, and fine-tune the LLM using those self-generated solutions as target outputs. We show that our approach improves the general reasoning ability of a 540B-parameter LLM (74.4%->82.1% on GSM8K, 78.2%->83.0% on DROP, 90.0%->94.4% on OpenBookQA, and 63.4%->67.9% on ANLI-A3) and achieves state-of-the-art-level performance, without any ground truth label. We conduct ablation studies and show that fine-tuning on reasoning is critical for self-improvement.

下载PDF全文

下载文献需遵守相关版权规定

论文标题