论文标题
蒙特卡罗变形金刚:序列预测的随机自我发项模型
The Monte Carlo Transformer: a stochastic self-attention model for sequence prediction
论文作者
论文摘要
本文介绍了顺序的蒙特卡洛变形金刚,这是一种自然捕获变压器体系结构中的观测分布的原始方法。网络的键,查询,值和注意向量被认为是其隐藏结构的未观察到的随机状态。这种生成模型使得在每个时间步骤中,收到的观察是其过去状态在给定的注意窗口中的随机函数。在这种一般状态空间设置中,我们使用顺序的蒙特卡洛方法来近似观察结果的状态的后验分布,并估计对数模型的梯度。因此,我们提出了一个生成模型,提供了预测分布,而不是单点估计。
This paper introduces the Sequential Monte Carlo Transformer, an original approach that naturally captures the observations distribution in a transformer architecture. The keys, queries, values and attention vectors of the network are considered as the unobserved stochastic states of its hidden structure. This generative model is such that at each time step the received observation is a random function of its past states in a given attention window. In this general state-space setting, we use Sequential Monte Carlo methods to approximate the posterior distributions of the states given the observations, and to estimate the gradient of the log-likelihood. We hence propose a generative model giving a predictive distribution, instead of a single-point estimate.