学习通过强化学习生成对话的提示

论文标题

学习通过强化学习生成对话的提示

Learning to Generate Prompts for Dialogue Generation through Reinforcement Learning

论文作者

Su, Hsuan, Chi, Pohan, Huang, Shih-Cheng, Lam, Chung Ho, Sahay, Saurav, Chen, Shang-Tse, Lee, Hung-yi

论文摘要

许多文献表明，基于及时的学习是使用大型预训练的语言模型的有效方法。最近的作品还展示了通过插入适当的提示来指导聊天机器人输出的可能性。基于梯度的方法通常用于扰动提示。但是，某些语言模型甚至不可公开提供。在这项工作中，我们首先探讨了提示和加强学习（RL）与转向模型的生成的组合，而无需访问任何模型的参数。其次，为了减少训练工作并增强对看不见的任务的普遍性，我们应用多任务学习以使模型学会更好地对新任务进行推广。实验结果表明，我们提出的方法可以成功控制几个最新的（SOTA）对话模型，而无需访问其参数。此外，该模型证明了与基线模型相比，以更少的步骤快速适应看不见的任务的强大能力。

Much literature has shown that prompt-based learning is an efficient method to make use of the large pre-trained language model. Recent works also exhibit the possibility of steering a chatbot's output by plugging in an appropriate prompt. Gradient-based methods are often used to perturb the prompts. However, some language models are not even available to the public. In this work, we first explored the combination of prompting and reinforcement learning (RL) to steer models' generation without accessing any of the models' parameters. Second, to reduce the training effort and enhance the generalizability to the unseen task, we apply multi-task learning to make the model learn to generalize to new tasks better. The experiment results show that our proposed method can successfully control several state-of-the-art (SOTA) dialogue models without accessing their parameters. Furthermore, the model demonstrates the strong ability to quickly adapt to an unseen task in fewer steps than the baseline model.

下载PDF全文

下载文献需遵守相关版权规定

论文标题