任何对称标准的快速距离甲骨文

论文标题

任何对称标准的快速距离甲骨文

Fast Distance Oracles for Any Symmetric Norm

论文作者

Deng, Yichuan, Song, Zhao, Weinstein, Omri, Zhang, Ruizhe

论文摘要

在距离甲骨文的问题中，目标是在$ d $ -D $ -Dimensional Metric Space $（\ Mathbb {x}^d，\ | \ | \ | \ cdot \ | _l）$中，将$ d $ d $ d $ dipermensional公制空间$（\ mathbb {x}输入数据点的$ s \ subseteq [n] $，所有距离$ \ | q -x_i \ | _l $对于$ x_i \在s $中可以快速近似（比琐碎的$ \ sim d | s | $ query时间更快）。该原始性是机器学习，数据挖掘和相似性搜索应用程序中的基本子例程。在$ \ ell_p $ norms的情况下，问题已经充分理解，最佳数据结构以$ p $的大多数值而闻名。我们的主要贡献是对于任何对称的norm $ \ | \ cdot \ | _l $的快速$（1+ \ varepsilon）$距离。此类包括$ \ ell_p $ norms和Orlicz规范作为特殊情况，以及实践中使用的其他规范，例如$ \ ell_p $ norms，小型支持规范和盒子 - 框架的顶部 - $ k $ norms，max-mixture和sum-mixture。 We propose a novel data structure with $\tilde{O}(n (d + \mathrm{mmc}(l)^2 ) )$ preprocessing time and space, and $t_q = \tilde{O}(d + |S| \cdot \mathrm{mmc}(l)^2)$ query time, for computing distances to a subset $S$ of data points,其中$ \ mathrm {mmc}（l）$是对称标准的复杂度量（浓度模量）。当$ l = \ ell_ {p} $时，此运行时与上述最新甲壳相匹配。

In the Distance Oracle problem, the goal is to preprocess $n$ vectors $x_1, x_2, \cdots, x_n$ in a $d$-dimensional metric space $(\mathbb{X}^d, \| \cdot \|_l)$ into a cheap data structure, so that given a query vector $q \in \mathbb{X}^d$ and a subset $S\subseteq [n]$ of the input data points, all distances $\| q - x_i \|_l$ for $x_i\in S$ can be quickly approximated (faster than the trivial $\sim d|S|$ query time). This primitive is a basic subroutine in machine learning, data mining and similarity search applications. In the case of $\ell_p$ norms, the problem is well understood, and optimal data structures are known for most values of $p$. Our main contribution is a fast $(1+\varepsilon)$ distance oracle for any symmetric norm $\|\cdot\|_l$. This class includes $\ell_p$ norms and Orlicz norms as special cases, as well as other norms used in practice, e.g. top-$k$ norms, max-mixture and sum-mixture of $\ell_p$ norms, small-support norms and the box-norm. We propose a novel data structure with $\tilde{O}(n (d + \mathrm{mmc}(l)^2 ) )$ preprocessing time and space, and $t_q = \tilde{O}(d + |S| \cdot \mathrm{mmc}(l)^2)$ query time, for computing distances to a subset $S$ of data points, where $\mathrm{mmc}(l)$ is a complexity-measure (concentration modulus) of the symmetric norm. When $l = \ell_{p}$ , this runtime matches the aforementioned state-of-art oracles.

下载PDF全文

下载文献需遵守相关版权规定

论文标题