论文标题
匹配地图恢复与未知数的异常值
Matching Map Recovery with an Unknown Number of Outliers
论文作者
论文摘要
我们考虑在两组$ d $维嘈杂的功能向量之间找到匹配地图的问题。设置的独特特征是,我们不假定第一组的所有向量在第二组中都具有相应的向量。如果$ n $和$ m $是这两组的尺寸,我们假设应恢复的匹配地图是在未知基数的子集$ k^*\ le \ min(n,m)$的子集中定义的。我们表明,在高维设置中,如果信噪比大于$ 5(d \ log(4nm/α))^{1/4} $,则可以以$ 1-α$的概率恢复真实的匹配地图。有趣的是,此阈值不取决于$ k^*$,并且与$ k = \ min(n,m)$的先前工作中获得的阈值相同。通过在[\ min(n,m)] \} $之间进行数据驱动的映射$ \ {\hatπ_K:k \ k \ k \ k \ in [\hatπ_K:k \} $,通过数据驱动的选择获得了上述属性的过程。每个$ \hatπ_K$最小化两组尺寸$ k $之间的距离正方形之和。所得的优化问题可以作为最低成本流问题提出,因此可以有效地解决。最后,我们报告了关于合成和现实世界数据的数值实验的结果,这些结果说明了我们的理论结果,并提供了对这项工作中研究算法的属性的进一步见解。
We consider the problem of finding the matching map between two sets of $d$-dimensional noisy feature-vectors. The distinctive feature of our setting is that we do not assume that all the vectors of the first set have their corresponding vector in the second set. If $n$ and $m$ are the sizes of these two sets, we assume that the matching map that should be recovered is defined on a subset of unknown cardinality $k^*\le \min(n,m)$. We show that, in the high-dimensional setting, if the signal-to-noise ratio is larger than $5(d\log(4nm/α))^{1/4}$, then the true matching map can be recovered with probability $1-α$. Interestingly, this threshold does not depend on $k^*$ and is the same as the one obtained in prior work in the case of $k = \min(n,m)$. The procedure for which the aforementioned property is proved is obtained by a data-driven selection among candidate mappings $\{\hatπ_k:k\in[\min(n,m)]\}$. Each $\hatπ_k$ minimizes the sum of squares of distances between two sets of size $k$. The resulting optimization problem can be formulated as a minimum-cost flow problem, and thus solved efficiently. Finally, we report the results of numerical experiments on both synthetic and real-world data that illustrate our theoretical results and provide further insight into the properties of the algorithms studied in this work.