Adavae：在语言建模的各种自动编码器中探索自适应GPT-2S

论文标题

Adavae：在语言建模的各种自动编码器中探索自适应GPT-2S

AdaVAE: Exploring Adaptive GPT-2s in Variational Auto-Encoders for Language Modeling

论文作者

Tu, Haoqin, Yang, Zhongliang, Yang, Jinshuai, Huang, Yongfeng

论文摘要

变异自动编码器（VAE）已成为同时实现自然语言的代表性学习和产生的事实学习范式。然而，现有的基于VAE的语言模型要么采用基本RNN，在多任务状况下处理复杂作品并不强大，要么用于对任何下游任务进行的两个预训练的语言模型（PLM），这是对资源的巨大消耗。在本文中，我们提出了第一个以自适应GPT-2S（ADAVAE）授权的VAE框架。与现有系统不同，我们使用具有自适应参数效率的组件的GPT-2统一VAE模型的编码器\＆解码器，并进一步引入潜在的注意操作，以更好地从变压器模型中构建潜在空间。来自多个维度的实验验证了Adavae有能力在三个相关任务（语言建模，表示建模和引导文本生成）中有效地组织语言，即使在培训中少于$ 15 \％$ $激活的参数。我们的代码可在\ url {https://github.com/imkett/adavae}上找到。

Variational Auto-Encoder (VAE) has become the de-facto learning paradigm in achieving representation learning and generation for natural language at the same time. Nevertheless, existing VAE-based language models either employ elementary RNNs, which is not powerful to handle complex works in the multi-task situation, or fine-tunes two pre-trained language models (PLMs) for any downstream task, which is a huge drain on resources. In this paper, we propose the first VAE framework empowered with adaptive GPT-2s (AdaVAE). Different from existing systems, we unify both the encoder\&decoder of the VAE model using GPT-2s with adaptive parameter-efficient components, and further introduce Latent Attention operation to better construct latent space from transformer models. Experiments from multiple dimensions validate that AdaVAE is competent to effectively organize language in three related tasks (language modeling, representation modeling and guided text generation) even with less than $15\%$ activated parameters in training. Our code is available at \url{https://github.com/ImKeTT/AdaVAE}.

下载PDF全文

下载文献需遵守相关版权规定

论文标题