论文标题
更快的Wasserstein距离估计与凹痕差异
Faster Wasserstein Distance Estimation with the Sinkhorn Divergence
论文作者
论文摘要
平方的瓦斯汀距离是比较非参数环境中概率分布的自然量。该数量通常是用插件估算器估算的,该估计器通过离散的最佳传输问题定义,可以通过添加订单$ε$的熵正规化并使用Sinkhorn的算法来求解为$ε$ - 准确性。在这项工作中,我们建议用sndhorn的差异来估算它,该差异也建立在熵正则化基础上,但包括辩护术语。我们表明,对于光滑的密度,该估计器具有可比的样本复杂性,但允许更高的正则化水平,即$ε^{1/2} $的顺序,这会导致改善的计算复杂性界限和实践中强大的加速。我们的理论分析涵盖了随机采样密度和对均匀网格的确定性离散的情况。我们还提出和分析基于理查森(Richardson)外出发散的估计量,在近似误差的规律性条件下,它特别满意高斯密度,该差异可以提高统计和计算效率保证。我们最终通过数值实验证明了提出的估计量的效率。
The squared Wasserstein distance is a natural quantity to compare probability distributions in a non-parametric setting. This quantity is usually estimated with the plug-in estimator, defined via a discrete optimal transport problem which can be solved to $ε$-accuracy by adding an entropic regularization of order $ε$ and using for instance Sinkhorn's algorithm. In this work, we propose instead to estimate it with the Sinkhorn divergence, which is also built on entropic regularization but includes debiasing terms. We show that, for smooth densities, this estimator has a comparable sample complexity but allows higher regularization levels, of order $ε^{1/2}$, which leads to improved computational complexity bounds and a strong speedup in practice. Our theoretical analysis covers the case of both randomly sampled densities and deterministic discretizations on uniform grids. We also propose and analyze an estimator based on Richardson extrapolation of the Sinkhorn divergence which enjoys improved statistical and computational efficiency guarantees, under a condition on the regularity of the approximation error, which is in particular satisfied for Gaussian densities. We finally demonstrate the efficiency of the proposed estimators with numerical experiments.