论文标题
非单调的自我终止语言模型
A Non-monotonic Self-terminating Language Model
论文作者
论文摘要
最近的大规模神经自回归序列模型在各种自然语言生成任务上表现出令人印象深刻的表现。然而,当使用解码算法(例如贪婪搜索,束搜索,上$ K $采样和核采样)产生时,它们产生的序列通常表现出诸如非终止,不良重复和过早终止之类的退化特性。在本文中,我们关注不完整的解码算法引起的非终止序列的问题。我们首先定义了一种不完整的可能解码算法,其中包括贪婪搜索,顶部$ K $采样和核采样,超出了Welleck等人最初提出的不完整的解码算法之外。 (2020)。然后,我们提出了一个非单调的自我终止语言模型,该模型显着放松了Welleck等人最初提出的自我终止语言模型的单调增加终止概率的约束。 (2020),使用不完整的可能解码算法时解决非终止序列的问题。我们证明,我们提出的模型不仅使用不完整的可能解码算法,而且还可以防止非终止序列。我们从经验上验证了使用各种体系结构的序列完成任务的模型。
Recent large-scale neural autoregressive sequence models have shown impressive performances on a variety of natural language generation tasks. However, their generated sequences often exhibit degenerate properties such as non-termination, undesirable repetition, and premature termination, when generated with decoding algorithms such as greedy search, beam search, top-$k$ sampling, and nucleus sampling. In this paper, we focus on the problem of non-terminating sequences resulting from an incomplete decoding algorithm. We first define an incomplete probable decoding algorithm which includes greedy search, top-$k$ sampling, and nucleus sampling, beyond the incomplete decoding algorithm originally put forward by Welleck et al. (2020). We then propose a non-monotonic self-terminating language model, which significantly relaxes the constraint of monotonically increasing termination probability in the originally proposed self-terminating language model by Welleck et al. (2020), to address the issue of non-terminating sequences when using incomplete probable decoding algorithms. We prove that our proposed model prevents non-terminating sequences when using not only incomplete probable decoding algorithms but also beam search. We empirically validate our model on sequence completion tasks with various architectures.