论文标题

局部敏感的哈希技术的实验分析,用于高维近似近似邻居搜索

Experimental Analysis of Locality Sensitive Hashing Techniques for High-Dimensional Approximate Nearest Neighbor Searches

论文作者

Jafari, Omid, Nagarkar, Parth

论文摘要

在许多多媒体检索应用中,在高维空间中找到最近的邻居是一个基本操作。已知确切的基于树的索引方法受到高维数据的臭名昭著的维度诅咒。近似搜索技术牺牲了一些准确性,同时返回足够好的结果以获得更快的性能。局部敏感的哈希(LSH)是一种非常流行的技术,用于在高维空间中找到大约最近的邻居。除了在查询结果上提供理论保证外,LSH技术的主要好处之一是它们对大型数据集的良好可扩展性,因为它们是基于外部内存的。现有LSH技术的最主要成本是算法时间和找到候选点所需的索引I/OS。现有作品在评估中没有比较这两个主要成本。在这篇实验调查文件中,我们展示了这两种成本对LSH技术总体性能的影响。我们比较了四个现实世界数据集上的三种最先进的技术,并表明,与最近的作品相比,C2LSH仍然是性能的最新算法,同时达到与最近的竞争对手相似的准确性。

Finding nearest neighbors in high-dimensional spaces is a fundamental operation in many multimedia retrieval applications. Exact tree-based indexing approaches are known to suffer from the notorious curse of dimensionality for high-dimensional data. Approximate searching techniques sacrifice some accuracy while returning good enough results for faster performance. Locality Sensitive Hashing (LSH) is a very popular technique for finding approximate nearest neighbors in high-dimensional spaces. Apart from providing theoretical guarantees on the query results, one of the main benefits of LSH techniques is their good scalability to large datasets because they are external memory based. The most dominant costs for existing LSH techniques are the algorithm time and the index I/Os required to find candidate points. Existing works do not compare both of these dominant costs in their evaluation. In this experimental survey paper, we show the impact of both these costs on the overall performance of the LSH technique. We compare three state-of-the-art techniques on four real-world datasets, and show that, in contrast to recent works, C2LSH is still the state-of-the-art algorithm in terms of performance while achieving similar accuracy as its recent competitors.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源