跨域结构保留异质域适应的投影

论文标题

跨域结构保留异质域适应的投影

Cross-Domain Structure Preserving Projection for Heterogeneous Domain Adaptation

论文作者

Wang, Qian, Breckon, Toby P.

论文摘要

异质域适应（HDA）解决了转移学习问题，其中来自源和目标域的数据具有不同的模态（例如文本和图像）或特征维度（例如，用不同方法提取的特征）。这对于多模式数据分析很有用。传统的域适应算法假定，源和目标样本的表示位于同一特征空间中，因此可能无法在解决异质域的适应问题时失败。当代的最先进的HDA方法通常由复杂的优化目标组成，以良好的性能，因此在计算上昂贵且易于概括。为了解决这些问题，我们提出了一种新型的HDA保存投影（CDSPP）算法的跨域结构。作为经典LPP到异质域的扩展，CDSPP的目的是学习特定于域的投影，以将样本特征从源和目标域映射到一个共同的子空间中，从而保留了类一致性并充分对准数据分布。 CDSPP很简单，并通过解决广义特征值问题具有确定性解决方案。它自然适用于监督的HDA，但也已扩展到半监督的HDA中，在该HDA上可以使用未标记的目标域样品。已经对HDA进行了广泛的实验（即，对于HDA的常用基准数据集（即Office-Caltech，多语言Reuters Collection，Nus wide-Imagenet）以及Office-home数据集，最初是由我们自己引入HDA的办公室房屋数据集，因为它比现有类别更大的类别（65 vs 10，6和8）。受监督和半监督的HDA的实验结果证明了我们提出的方法与当代最先进的方法的出色性能。

Heterogeneous Domain Adaptation (HDA) addresses the transfer learning problems where data from the source and target domains are of different modalities (e.g., texts and images) or feature dimensions (e.g., features extracted with different methods). It is useful for multi-modal data analysis. Traditional domain adaptation algorithms assume that the representations of source and target samples reside in the same feature space, hence are likely to fail in solving the heterogeneous domain adaptation problem. Contemporary state-of-the-art HDA approaches are usually composed of complex optimization objectives for favourable performance and are therefore computationally expensive and less generalizable. To address these issues, we propose a novel Cross-Domain Structure Preserving Projection (CDSPP) algorithm for HDA. As an extension of the classic LPP to heterogeneous domains, CDSPP aims to learn domain-specific projections to map sample features from source and target domains into a common subspace such that the class consistency is preserved and data distributions are sufficiently aligned. CDSPP is simple and has deterministic solutions by solving a generalized eigenvalue problem. It is naturally suitable for supervised HDA but has also been extended for semi-supervised HDA where the unlabelled target domain samples are available. Extensive experiments have been conducted on commonly used benchmark datasets (i.e. Office-Caltech, Multilingual Reuters Collection, NUS-WIDE-ImageNet) for HDA as well as the Office-Home dataset firstly introduced for HDA by ourselves due to its significantly larger number of classes than the existing ones (65 vs 10, 6 and 8). The experimental results of both supervised and semi-supervised HDA demonstrate the superior performance of our proposed method against contemporary state-of-the-art methods.

下载PDF全文

下载文献需遵守相关版权规定

论文标题