论文标题

任何对称标准的快速距离甲骨文

Fast Distance Oracles for Any Symmetric Norm

论文作者

Deng, Yichuan, Song, Zhao, Weinstein, Omri, Zhang, Ruizhe

论文摘要

在距离甲骨文的问题中,目标是在$ d $ -D $ -Dimensional Metric Space $(\ Mathbb {x}^d,\ | \ | \ | \ cdot \ | _l)$中,将$ d $ d $ d $ dipermensional公制空间$(\ mathbb {x}输入数据点的$ s \ subseteq [n] $,所有距离$ \ | q -x_i \ | _l $对于$ x_i \在s $中可以快速近似(比琐碎的$ \ sim d | s | $ query时间更快)。该原始性是机器学习,数据挖掘和相似性搜索应用程序中的基本子例程。在$ \ ell_p $ norms的情况下,问题已经充分理解,最佳数据结构以$ p $的大多数值而闻名。 我们的主要贡献是对于任何对称的norm $ \ | \ cdot \ | _l $的快速$(1+ \ varepsilon)$距离。此类包括$ \ ell_p $ norms和Orlicz规范作为特殊情况,以及实践中使用的其他规范,例如$ \ ell_p $ norms,小型支持规范和盒子 - 框架的顶部 - $ k $ norms,max-mixture和sum-mixture。 We propose a novel data structure with $\tilde{O}(n (d + \mathrm{mmc}(l)^2 ) )$ preprocessing time and space, and $t_q = \tilde{O}(d + |S| \cdot \mathrm{mmc}(l)^2)$ query time, for computing distances to a subset $S$ of data points,其中$ \ mathrm {mmc}(l)$是对称标准的复杂度量(浓度模量)。当$ l = \ ell_ {p} $时,此运行时与上述最新甲壳相匹配。

In the Distance Oracle problem, the goal is to preprocess $n$ vectors $x_1, x_2, \cdots, x_n$ in a $d$-dimensional metric space $(\mathbb{X}^d, \| \cdot \|_l)$ into a cheap data structure, so that given a query vector $q \in \mathbb{X}^d$ and a subset $S\subseteq [n]$ of the input data points, all distances $\| q - x_i \|_l$ for $x_i\in S$ can be quickly approximated (faster than the trivial $\sim d|S|$ query time). This primitive is a basic subroutine in machine learning, data mining and similarity search applications. In the case of $\ell_p$ norms, the problem is well understood, and optimal data structures are known for most values of $p$. Our main contribution is a fast $(1+\varepsilon)$ distance oracle for any symmetric norm $\|\cdot\|_l$. This class includes $\ell_p$ norms and Orlicz norms as special cases, as well as other norms used in practice, e.g. top-$k$ norms, max-mixture and sum-mixture of $\ell_p$ norms, small-support norms and the box-norm. We propose a novel data structure with $\tilde{O}(n (d + \mathrm{mmc}(l)^2 ) )$ preprocessing time and space, and $t_q = \tilde{O}(d + |S| \cdot \mathrm{mmc}(l)^2)$ query time, for computing distances to a subset $S$ of data points, where $\mathrm{mmc}(l)$ is a complexity-measure (concentration modulus) of the symmetric norm. When $l = \ell_{p}$ , this runtime matches the aforementioned state-of-art oracles.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源