论文标题

晚及时调整:迟到的提示可能比许多提示更好

Late Prompt Tuning: A Late Prompt Could Be Better Than Many Prompts

论文作者

Liu, Xiangyang, Sun, Tianxiang, Huang, Xuanjing, Qiu, Xipeng

论文摘要

提示调整是使用预训练模型(PTMS)的参数效率调谐(petuning)方法,该方法仅对输入进行软提示,而仅优化提示PTMS以使PTMS适应下游任务。尽管它是参数和部署效率的,但其性能仍然落后于其他最先进的垂体方法。此外,由于整个模型的背部传播,迅速调整的训练成本并不能显着降低。通过经验分析,我们阐明了迅速调整的滞后性能,并认识到从标签信号到插入的提示的传播距离之间的权衡以及提示对模型输出的影响。此外,我们提出了较晚的提示调节(LPT),该调整将较晚的提示插入PTM的中间层,而不是输入层或所有层。晚期提示是由在提示插入层之前在隐藏状态下在隐藏状态下的神经提示发电机获得的,因此依赖实例。通过在各种任务和PTM上进行广泛的实验结果,我们表明LPT可以在全数据和少数射击场景下达到完整的模型调整和其他垂体方法的竞争性能,同时具有更快的训练速度和较低的记忆成本。

Prompt tuning is a parameter-efficient tuning (PETuning) method for utilizing pre-trained models (PTMs) that simply prepends a soft prompt to the input and only optimizes the prompt to adapt PTMs to downstream tasks. Although it is parameter- and deployment-efficient, its performance still lags behind other state-of-the-art PETuning methods. Besides, the training cost of prompt tuning is not significantly reduced due to the back-propagation through the entire model. Through empirical analyses, we shed some light on the lagging performance of prompt tuning and recognize a trade-off between the propagation distance from label signals to the inserted prompt and the influence of the prompt on model outputs. Further, we present Late Prompt Tuning (LPT) that inserts a late prompt into an intermediate layer of the PTM instead of the input layer or all layers. The late prompt is obtained by a neural prompt generator conditioned on the hidden states before the prompt insertion layer and therefore is instance-dependent. Through extensive experimental results across various tasks and PTMs, we show that LPT can achieve competitive performance to full model tuning and other PETuning methods under both full-data and few-shot scenarios while possessing faster training speed and lower memory cost.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源