论文标题
跨视图图像检索 - 通过深度学习的地面到空中图像检索
Cross-View Image Retrieval -- Ground to Aerial Image Retrieval through Deep Learning
论文作者
论文摘要
跨模式检索旨在衡量不同类型数据之间的内容相似性。该想法以前已应用于视觉,文本和语音数据。 In this paper, we present a novel cross-modal retrieval method specifically for multi-view images, called Cross-view Image Retrieval CVIR.我们的方法旨在找到一个功能空间以及一个嵌入空间,在该空间中,将街道视图图像的样本直接与卫星视图图像(和反之亦然)进行了比较。为了进行这种比较,已经提出了一种新型的基于指标的解决方案“ DEEPCVIR”。以前的跨视图图像数据集缺乏(1)缺乏类信息; (2)最初是通过耦合图像收集的,用于跨视图图像地理定位任务; (3)不包括街外位置的任何图像。为了训练,比较和评估跨视图图像检索的性能,我们提出了一个新的6类跨视图数据集,称为CrossViewRet,其中包括图像,包括高速公路,山,宫殿,宫殿,河流,河流,船舶和体育场,每个班级的高分辨率双视图像。结果表明,所提出的DEEPCVIR在给定数据集的CVIR任务上优于常规的匹配方法,也将作为未来研究的基线。
Cross-modal retrieval aims to measure the content similarity between different types of data. The idea has been previously applied to visual, text, and speech data. In this paper, we present a novel cross-modal retrieval method specifically for multi-view images, called Cross-view Image Retrieval CVIR. Our approach aims to find a feature space as well as an embedding space in which samples from street-view images are compared directly to satellite-view images (and vice-versa). For this comparison, a novel deep metric learning based solution "DeepCVIR" has been proposed. Previous cross-view image datasets are deficient in that they (1) lack class information; (2) were originally collected for cross-view image geolocalization task with coupled images; (3) do not include any images from off-street locations. To train, compare, and evaluate the performance of cross-view image retrieval, we present a new 6 class cross-view image dataset termed as CrossViewRet which comprises of images including freeway, mountain, palace, river, ship, and stadium with 700 high-resolution dual-view images for each class. Results show that the proposed DeepCVIR outperforms conventional matching approaches on the CVIR task for the given dataset and would also serve as the baseline for future research.