论文标题

关于多域长尾识别,不平衡领域的概括及其他

On Multi-Domain Long-Tailed Recognition, Imbalanced Domain Generalization and Beyond

论文作者

Yang, Yuzhe, Wang, Hao, Katabi, Dina

论文摘要

现实世界中的数据通常显示出不平衡的标签分布。现有关于数据不平衡的研究集中在单域设置上,即样本来自相同的数据分布。但是,自然数据可以起源于不同的领域,在一个领域中的少数类别可以从其他域中具有丰富的实例。我们将多域长识别识别(MDLT)的任务正式化,该任务从多域的不平衡数据中学习,解决了跨域的标签失衡,域移动和不同标签分布,并将其推广到所有域级对。我们首先开发了域级可传递性图,并表明这种可传递性决定了MDLT中学习的成功。然后,我们提出了一种理论上的学习策略Boda,该策略跟踪可转移性统计的上限,并确保跨域级分布之间的平衡对齐和校准。我们策划了基于广泛使用的多域数据集的五个MDLT基准测试,并将BODA与跨越不同学习策略的二十个算法进行比较。广泛而严格的实验验证了BODA的出色性能。此外,作为副产品,Boda建立了有关域概括基准的最新最新最新,强调了解决跨域数据不平衡的重要性,这对于改善概括至看不见的域可能至关重要。代码和数据可在以下网址获得:https://github.com/yyzharry/multi-domain-mmbalance。

Real-world data often exhibit imbalanced label distributions. Existing studies on data imbalance focus on single-domain settings, i.e., samples are from the same data distribution. However, natural data can originate from distinct domains, where a minority class in one domain could have abundant instances from other domains. We formalize the task of Multi-Domain Long-Tailed Recognition (MDLT), which learns from multi-domain imbalanced data, addresses label imbalance, domain shift, and divergent label distributions across domains, and generalizes to all domain-class pairs. We first develop the domain-class transferability graph, and show that such transferability governs the success of learning in MDLT. We then propose BoDA, a theoretically grounded learning strategy that tracks the upper bound of transferability statistics, and ensures balanced alignment and calibration across imbalanced domain-class distributions. We curate five MDLT benchmarks based on widely-used multi-domain datasets, and compare BoDA to twenty algorithms that span different learning strategies. Extensive and rigorous experiments verify the superior performance of BoDA. Further, as a byproduct, BoDA establishes new state-of-the-art on Domain Generalization benchmarks, highlighting the importance of addressing data imbalance across domains, which can be crucial for improving generalization to unseen domains. Code and data are available at: https://github.com/YyzHarry/multi-domain-imbalance.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源