解码器调整：有效的语言理解作为解码

论文标题

解码器调整：有效的语言理解作为解码

Decoder Tuning: Efficient Language Understanding as Decoding

论文作者

Cui, Ganqu, Li, Wentao, Ding, Ning, Huang, Longtao, Liu, Zhiyuan, Sun, Maosong

论文摘要

随着预先训练的模型（PTM）的不断增长尺寸，仅为用户（即Model-As-A-Service（MAAS）设置）提供推理API是一种新兴的实践。为了将PTMS与模型参数冻结在一起，大多数当前方法都集中在输入方面，以寻求强大的提示来刺激模型以进行正确答案。但是，我们认为，由于缺乏梯度信号，输入方适应可能很艰巨，并且通常需要数千个API查询，从而导致高计算和时间成本。鉴于此，我们提出了解码器调整（DECT），相比之下，它优化了输出侧的特定任务解码器网络。具体而言，DECT First提取迅速刺激的输出得分以进行初始预测。最重要的是，我们在输出表示上训练一个其他解码器网络，以结合后验数据知识。通过基于梯度的优化，可以在几秒钟内训练DECT，并且每个样本只需要一个PTM查询。从经验上讲，我们进行了广泛的自然语言理解实验，并表明，DECT以200美元的$ 200 \ times $加速胜过最先进的算法。

With the evergrowing sizes of pre-trained models (PTMs), it has been an emerging practice to only provide the inference APIs for users, namely model-as-a-service (MaaS) setting. To adapt PTMs with model parameters frozen, most current approaches focus on the input side, seeking for powerful prompts to stimulate models for correct answers. However, we argue that input-side adaptation could be arduous due to the lack of gradient signals and they usually require thousands of API queries, resulting in high computation and time costs. In light of this, we present Decoder Tuning (DecT), which in contrast optimizes task-specific decoder networks on the output side. Specifically, DecT first extracts prompt-stimulated output scores for initial predictions. On top of that, we train an additional decoder network on the output representations to incorporate posterior data knowledge. By gradient-based optimization, DecT can be trained within several seconds and requires only one PTM query per sample. Empirically, we conduct extensive natural language understanding experiments and show that DecT significantly outperforms state-of-the-art algorithms with a $200\times$ speed-up.

下载PDF全文

下载文献需遵守相关版权规定

论文标题