论文标题
噪声对抗性学习中的最佳噪声不是您的想法
The Optimal Noise in Noise-Contrastive Learning Is Not What You Think
论文作者
论文摘要
学习数据分布的参数模型是一个众所周知的统计问题,在深度学习中扩大了兴趣,因此人们对其进行了新的兴趣。将问题作为一项自我监督的任务,其中数据样本与噪声样本区分开来,是最新方法的核心,从噪声对抗性估计(NCE)开始。但是,这种对比学习需要良好的噪声分布,这很难指定。因此,特定于域特异性的启发式方法被广泛使用。尽管缺少一个全面的理论,但在实践中应广泛认为,应在分布和比例方面使最佳噪声等于数据。该设置尤其是生成对抗网络(GAN)的基础。在这里,我们从经验和理论上挑战了最佳噪声的这一假设。我们表明,从渐近方差方面,与此假设偏离这一假设实际上可以导致更好的统计估计器。特别是,最佳噪声分布不同于数据,甚至与其他家庭不同。
Learning a parametric model of a data distribution is a well-known statistical problem that has seen renewed interest as it is brought to scale in deep learning. Framing the problem as a self-supervised task, where data samples are discriminated from noise samples, is at the core of state-of-the-art methods, beginning with Noise-Contrastive Estimation (NCE). Yet, such contrastive learning requires a good noise distribution, which is hard to specify; domain-specific heuristics are therefore widely used. While a comprehensive theory is missing, it is widely assumed that the optimal noise should in practice be made equal to the data, both in distribution and proportion. This setting underlies Generative Adversarial Networks (GANs) in particular. Here, we empirically and theoretically challenge this assumption on the optimal noise. We show that deviating from this assumption can actually lead to better statistical estimators, in terms of asymptotic variance. In particular, the optimal noise distribution is different from the data's and even from a different family.