论文标题
可概括的深度度量学习的编码残差转换
Coded Residual Transform for Generalizable Deep Metric Learning
论文作者
论文摘要
深度度量学习中的一个基本挑战是功能嵌入网络模型的概括能力,因为在培训类中学习的嵌入网络需要在新的测试类别上进行评估。为了应对这一挑战,在本文中,我们引入了一种称为编码残差转换(CRT)的新方法,用于深度度量学习,以显着提高其概括能力。具体而言,我们学习了一组多元化的原型特征,将特征映射投影到每个原型上,然后使用其相关系数与每个原型的相关系数加权的投影残差进行编码。提出的CRT方法具有以下两个唯一特征。首先,它根据对多元化原型的投影从一组免费角度来表示并编码特征图。其次,与现有的基于变压器的特征表示方法不同,该方法基于全局相关分析编码特征的原始值,建议的编码残差转换编码了原始特征及其投影原型之间的相对差异。嵌入空间密度和光谱衰减分析表明,这种多观测性投影在多样化的原型和编码的残差表示上能够在度量学习中显着提高概括能力。最后,为了进一步提高概括性能,我们建议在具有不同尺寸的投影原型和嵌入维度的编码残差变换之间强制其特征相似性矩阵的一致性。我们广泛的实验结果和消融研究表明,所提出的CRT方法的表现优于最先进的深度度量学习方法,并在CUB数据集中提高了当前最佳方法高达4.28%的最佳方法。
A fundamental challenge in deep metric learning is the generalization capability of the feature embedding network model since the embedding network learned on training classes need to be evaluated on new test classes. To address this challenge, in this paper, we introduce a new method called coded residual transform (CRT) for deep metric learning to significantly improve its generalization capability. Specifically, we learn a set of diversified prototype features, project the feature map onto each prototype, and then encode its features using their projection residuals weighted by their correlation coefficients with each prototype. The proposed CRT method has the following two unique characteristics. First, it represents and encodes the feature map from a set of complimentary perspectives based on projections onto diversified prototypes. Second, unlike existing transformer-based feature representation approaches which encode the original values of features based on global correlation analysis, the proposed coded residual transform encodes the relative differences between the original features and their projected prototypes. Embedding space density and spectral decay analysis show that this multi-perspective projection onto diversified prototypes and coded residual representation are able to achieve significantly improved generalization capability in metric learning. Finally, to further enhance the generalization performance, we propose to enforce the consistency on their feature similarity matrices between coded residual transforms with different sizes of projection prototypes and embedding dimensions. Our extensive experimental results and ablation studies demonstrate that the proposed CRT method outperform the state-of-the-art deep metric learning methods by large margins and improving upon the current best method by up to 4.28% on the CUB dataset.