自回归模型及其替代方案的局限性

论文标题

自回归模型及其替代方案的局限性

Limitations of Autoregressive Models and Their Alternatives

论文作者

Lin, Chu-Cheng, Jaech, Aaron, Li, Xin, Gormley, Matthew R., Eisner, Jason

论文摘要

标准自回归语言模型仅执行多项式时间计算来计算下一个符号的概率。尽管这很有吸引力，但这意味着他们无法建模很难计算的下一符号概率的分布。的确，他们甚至无法对它们进行足够的建模，以解决工程师可能希望咨询语言模型的相关简易决策问题。这些局限性无论使用多少计算和数据用于训练模型，除非允许模型访问以序列长度生长的超单粒子的甲骨文参数。因此，简单地培训更大的自回归语言模型并不是NLP的灵丹妙药。替代方案包括基于能量的模型（放弃有效的采样）和潜在的自动回归模型（放弃给定字符串的有效评分）。两者都足够强大，可以避免上述限制。

Standard autoregressive language models perform only polynomial-time computation to compute the probability of the next symbol. While this is attractive, it means they cannot model distributions whose next-symbol probability is hard to compute. Indeed, they cannot even model them well enough to solve associated easy decision problems for which an engineer might want to consult a language model. These limitations apply no matter how much computation and data are used to train the model, unless the model is given access to oracle parameters that grow superpolynomially in sequence length. Thus, simply training larger autoregressive language models is not a panacea for NLP. Alternatives include energy-based models (which give up efficient sampling) and latent-variable autoregressive models (which give up efficient scoring of a given string). Both are powerful enough to escape the above limitations.

下载PDF全文

下载文献需遵守相关版权规定

论文标题