论文标题
通过主题建模了解自我监督的学习的鲁棒性
Understanding The Robustness of Self-supervised Learning Through Topic Modeling
论文作者
论文摘要
自我监督的学习已大大提高了许多NLP任务的执行。但是,自学学习如何发现有用的表示形式,为什么它比诸如概率模型之类的传统方法更好。在本文中,我们专注于主题建模的背景,并突出自我监督学习的关键优势 - 当应用于主题模型生成的数据时,自我监督的学习可能忽略了特定的模型,因此不太容易被模型指定。特别是,我们证明,基于重建或对比样本的常用自我监督目标都可以为一般主题模型恢复有用的后验信息。从经验上讲,我们表明使用正确模型可以在后验推理上表现出相同的目标,同时使用误指定模型表现优于后验推断。
Self-supervised learning has significantly improved the performance of many NLP tasks. However, how can self-supervised learning discover useful representations, and why is it better than traditional approaches such as probabilistic models are still largely unknown. In this paper, we focus on the context of topic modeling and highlight a key advantage of self-supervised learning - when applied to data generated by topic models, self-supervised learning can be oblivious to the specific model, and hence is less susceptible to model misspecification. In particular, we prove that commonly used self-supervised objectives based on reconstruction or contrastive samples can both recover useful posterior information for general topic models. Empirically, we show that the same objectives can perform on par with posterior inference using the correct model, while outperforming posterior inference using misspecified models.