赋予文本生成的嵌入空间上的扩散模型

论文标题

赋予文本生成的嵌入空间上的扩散模型

Empowering Diffusion Models on the Embedding Space for Text Generation

论文作者

Gao, Zhujin, Guo, Junliang, Tan, Xu, Zhu, Yongxin, Zhang, Fang, Bian, Jiang, Xu, Linli

论文摘要

扩散模型已在视觉和音频任务上达到了最新的合成质量，并且最近的工作通过在嵌入空间上扩散来进一步使它们适应文本数据。在本文中，我们对嵌入空间和denoising模型遇到的优化挑战进行了系统的研究，这些挑战尚未仔细探索。首先，对于嵌入可以学习数据分布，这可能会导致嵌入空间和不稳定训练的崩溃。为了减轻这个问题，我们提出了一个称为锚定损失的新目标，比以前的方法更有效。其次，我们发现传统时间表的噪声水平不足以训练理想的deNoising模型，同时引入了不同程度的变性。为了应对这一挑战，我们提出了一个名为“噪声重新恢复”的新颖框架。基于上述分析，我们提出了基于变压器的嵌入扩散模型。关于精液文本生成任务品种的实验显示了所提出的方法的有效性以及与以前最新嵌入扩散基线的优势相比。

Diffusion models have achieved state-of-the-art synthesis quality on both visual and audio tasks, and recent works further adapt them to textual data by diffusing on the embedding space. In this paper, we conduct systematic studies of the optimization challenges encountered with both the embedding space and the denoising model, which have not been carefully explored. Firstly, the data distribution is learnable for embeddings, which may lead to the collapse of the embedding space and unstable training. To alleviate this problem, we propose a new objective called the anchor loss which is more efficient than previous methods. Secondly, we find the noise levels of conventional schedules are insufficient for training a desirable denoising model while introducing varying degrees of degeneration in consequence. To address this challenge, we propose a novel framework called noise rescaling. Based on the above analysis, we propose Difformer, an embedding diffusion model based on Transformer. Experiments on varieties of seminal text generation tasks show the effectiveness of the proposed methods and the superiority of Difformer over previous state-of-the-art embedding diffusion baselines.

下载PDF全文

下载文献需遵守相关版权规定

论文标题