匹配地图恢复与未知数的异常值

论文标题

匹配地图恢复与未知数的异常值

Matching Map Recovery with an Unknown Number of Outliers

论文作者

Minasyan, Arshak, Galstyan, Tigran, Hunanyan, Sona, Dalalyan, Arnak

论文摘要

我们考虑在两组$ d $维嘈杂的功能向量之间找到匹配地图的问题。设置的独特特征是，我们不假定第一组的所有向量在第二组中都具有相应的向量。如果$ n $和$ m $是这两组的尺寸，我们假设应恢复的匹配地图是在未知基数的子集$ k^*\ le \ min（n，m）$的子集中定义的。我们表明，在高维设置中，如果信噪比大于$ 5（d \ log（4nm/α））^{1/4} $，则可以以$ 1-α$的概率恢复真实的匹配地图。有趣的是，此阈值不取决于$ k^*$，并且与$ k = \ min（n，m）$的先前工作中获得的阈值相同。通过在[\ min（n，m）] \} $之间进行数据驱动的映射$ \ {\hatπ_K：k \ k \ k \ k \ in [\hatπ_K：k \} $，通过数据驱动的选择获得了上述属性的过程。每个$ \hatπ_K$最小化两组尺寸$ k $之间的距离正方形之和。所得的优化问题可以作为最低成本流问题提出，因此可以有效地解决。最后，我们报告了关于合成和现实世界数据的数值实验的结果，这些结果说明了我们的理论结果，并提供了对这项工作中研究算法的属性的进一步见解。

We consider the problem of finding the matching map between two sets of $d$-dimensional noisy feature-vectors. The distinctive feature of our setting is that we do not assume that all the vectors of the first set have their corresponding vector in the second set. If $n$ and $m$ are the sizes of these two sets, we assume that the matching map that should be recovered is defined on a subset of unknown cardinality $k^*\le \min(n,m)$. We show that, in the high-dimensional setting, if the signal-to-noise ratio is larger than $5(d\log(4nm/α))^{1/4}$, then the true matching map can be recovered with probability $1-α$. Interestingly, this threshold does not depend on $k^*$ and is the same as the one obtained in prior work in the case of $k = \min(n,m)$. The procedure for which the aforementioned property is proved is obtained by a data-driven selection among candidate mappings $\{\hatπ_k:k\in[\min(n,m)]\}$. Each $\hatπ_k$ minimizes the sum of squares of distances between two sets of size $k$. The resulting optimization problem can be formulated as a minimum-cost flow problem, and thus solved efficiently. Finally, we report the results of numerical experiments on both synthetic and real-world data that illustrate our theoretical results and provide further insight into the properties of the algorithms studied in this work.

下载PDF全文

下载文献需遵守相关版权规定

论文标题