论文标题
蒙版预测任务:参数可识别性视图
Masked prediction tasks: a parameter identifiability view
论文作者
论文摘要
理论和经验的自学学习中的绝大多数工作(尽管主要是后者)主要集中在恢复下游任务的良好特征上,而“良好”的定义通常与下游任务本身相关。毫无疑问,该镜头非常有趣,但是遇到了一个问题,即没有一个“规范”的下游任务集中精力 - 在实践中,通常通过在基准数据集Du Jour上竞争来解决此问题。 在本文中,我们提出了一个替代镜头:参数可识别性。更确切地说,我们考虑来自参数概率模型的数据,并以适当选择的参数形式训练自我监督的学习预测指标。然后,我们询问是否可以从最佳预测指标中读取概率模型的基础真实参数。我们专注于广泛使用的自我监督的学习方法,用于预测蒙面令牌,这在自然语言和视觉数据中都很受欢迎。 尽管这种方法的化身已经成功地用于简单的概率模型(例如,学习完全观察到的无向图形模型),但我们将重点放在捕获顺序结构的潜可变量模型上 - 即具有离散和条件性高斯观测值的隐藏的Markov模型。我们表明,可能存在丰富的可能性,其中一些预测任务产生可识别性,而另一些则没有。因此,我们的结果是基于自我监督学习的理论基础,因此可能有益于实践。此外,我们以张量秩分解的唯一性来揭开近距离连接,这是通过矩的镜头研究可识别性的广泛使用的工具。
The vast majority of work in self-supervised learning, both theoretical and empirical (though mostly the latter), have largely focused on recovering good features for downstream tasks, with the definition of "good" often being intricately tied to the downstream task itself. This lens is undoubtedly very interesting, but suffers from the problem that there isn't a "canonical" set of downstream tasks to focus on -- in practice, this problem is usually resolved by competing on the benchmark dataset du jour. In this paper, we present an alternative lens: one of parameter identifiability. More precisely, we consider data coming from a parametric probabilistic model, and train a self-supervised learning predictor with a suitably chosen parametric form. Then, we ask whether we can read off the ground truth parameters of the probabilistic model from the optimal predictor. We focus on the widely used self-supervised learning method of predicting masked tokens, which is popular for both natural languages and visual data. While incarnations of this approach have already been successfully used for simpler probabilistic models (e.g. learning fully-observed undirected graphical models), we focus instead on latent-variable models capturing sequential structures -- namely Hidden Markov Models with both discrete and conditionally Gaussian observations. We show that there is a rich landscape of possibilities, out of which some prediction tasks yield identifiability, while others do not. Our results, borne of a theoretical grounding of self-supervised learning, could thus potentially beneficially inform practice. Moreover, we uncover close connections with uniqueness of tensor rank decompositions -- a widely used tool in studying identifiability through the lens of the method of moments.