论文标题
希尔伯特曲线投影距离进行比较
Hilbert Curve Projection Distance for Distribution Comparison
论文作者
论文摘要
分发比较在许多机器学习任务(例如数据分类和生成建模)中起着核心作用。在这项研究中,我们提出了一种称为Hilbert曲线投影(HCP)距离的新型度量,以测量具有低复杂性的两个概率分布之间的距离。特别是,我们首先使用希尔伯特曲线投射两个高维概率分布,以获得它们之间的耦合,然后根据耦合在原始空间中这两个分布之间的传输距离进行计算。我们表明,HCP距离是一个适当的度量标准,并且定义明确,以便在有界支持的情况下进行概率度量。此外,我们证明了$ d $二维空间中的经验经验HCP距离,其人口的成本不超过$ o(n^{ - 1/2 \ max \ max \ {d,p \}}})$。为了抑制差异性的诅咒,我们还使用(可学习的)子空间投影开发了HCP距离的两个变体。合成数据和现实世界数据的实验表明,我们的HCP距离是瓦斯坦斯坦距离的有效替代,其复杂性低并克服了切成薄片的Wasserstein距离的缺点。
Distribution comparison plays a central role in many machine learning tasks like data classification and generative modeling. In this study, we propose a novel metric, called Hilbert curve projection (HCP) distance, to measure the distance between two probability distributions with low complexity. In particular, we first project two high-dimensional probability distributions using Hilbert curve to obtain a coupling between them, and then calculate the transport distance between these two distributions in the original space, according to the coupling. We show that HCP distance is a proper metric and is well-defined for probability measures with bounded supports. Furthermore, we demonstrate that the modified empirical HCP distance with the $L_p$ cost in the $d$-dimensional space converges to its population counterpart at a rate of no more than $O(n^{-1/2\max\{d,p\}})$. To suppress the curse-of-dimensionality, we also develop two variants of the HCP distance using (learnable) subspace projections. Experiments on both synthetic and real-world data show that our HCP distance works as an effective surrogate of the Wasserstein distance with low complexity and overcomes the drawbacks of the sliced Wasserstein distance.