通过分层最佳运输的域适应的理论保证

论文标题

通过分层最佳运输的域适应的理论保证

Theoretical Guarantees for Domain Adaptation with Hierarchical Optimal Transport

论文作者

Hamri, Mourad El, Bennani, Younès, Falih, Issam

论文摘要

当数据生成过程分别在培训和测试样本之间（称为源域和目标域）之间，当数据生成过程不同时，域的适应性是统计学习理论中的一个重要问题。最近的理论进步表明，域自适应算法的成功在很大程度上依赖于它们最大程度地减少源域和目标域概率分布之间的差异的能力。但是，将这种差异最小化不能独立于最小化其他关键成分（例如源风险或理想关节假设的综合误差）的最小化。这些条款之间的权衡通常由算法解决方案确保，这些算法是隐式的，而不是由理论保证直接反映。为了解决这个问题的底层，我们在本文中提出了一个通过层次最佳运输来适应领域的新理论框架。该框架提供了更明确的概括范围，并使我们可以将两个域中的样本的自然层次组织视为类或群集。此外，我们在称为层次的瓦斯汀距离之间提供了一个新的差异度量，该量子域在轻度假设下表明，必须将结构对齐才能成功适应。

Domain adaptation arises as an important problem in statistical learning theory when the data-generating processes differ between training and test samples, respectively called source and target domains. Recent theoretical advances show that the success of domain adaptation algorithms heavily relies on their ability to minimize the divergence between the probability distributions of the source and target domains. However, minimizing this divergence cannot be done independently of the minimization of other key ingredients such as the source risk or the combined error of the ideal joint hypothesis. The trade-off between these terms is often ensured by algorithmic solutions that remain implicit and not directly reflected by the theoretical guarantees. To get to the bottom of this issue, we propose in this paper a new theoretical framework for domain adaptation through hierarchical optimal transport. This framework provides more explicit generalization bounds and allows us to consider the natural hierarchical organization of samples in both domains into classes or clusters. Additionally, we provide a new divergence measure between the source and target domains called Hierarchical Wasserstein distance that indicates under mild assumptions, which structures have to be aligned to lead to a successful adaptation.

下载PDF全文

下载文献需遵守相关版权规定

论文标题