它们并非完全没有用：用于回收可转移的无标记数据，以进行课堂不匹配的半监督学习

论文标题

它们并非完全没有用：用于回收可转移的无标记数据，以进行课堂不匹配的半监督学习

They are Not Completely Useless: Towards Recycling Transferable Unlabeled Data for Class-Mismatched Semi-Supervised Learning

论文作者

Huang, Zhuo, Tai, Ying, Wang, Chengjie, Yang, Jian, Gong, Chen

论文摘要

半监督的学习（SSL）与不匹配的类的半监督学习涉及有限标记数据中的利益类别只是大量未标记数据中类的一个子集。结果，仅由未标记的数据所拥有的类可能会误导分类器培训，从而阻碍各种SSL方法的现实降落。为了解决此问题，现有方法通常将未标记的数据划分为分发数据（ID）数据和分布（OOD）数据，并直接丢弃或削弱OOD数据以避免其不良影响。换句话说，他们将OOD数据视为完全没有用的数据，因此完全忽略了它们所包含的分类的潜在有价值的信息。为了解决此缺陷，本文提出了一种“可转移的OOD数据回收”方法（TOOR）方法，该方法适当地利用ID数据以及“可回收” OOD数据来丰富用于执行类不匹配SSL的信息。具体而言，Toor首先将所有未标记的数据归因于ID数据或OOD数据，其中ID数据直接用于培训。然后，我们将与ID数据有密切关系并将数据标记为可回收的OOD数据视为可回收的数据，并采用对抗域的适应性将其投影到ID数据的空间和标记的数据。换句话说，OOD基准的可回收性通过其可传递性评估，并且可回收的OOD数据被传输，以便它们与已知利益类别的分布兼容。因此，我们的Toor方法比现有方法从未标记的数据中提取更多信息，因此它可以实现典型基准数据集中的实验证明的改进性能。

Semi-Supervised Learning (SSL) with mismatched classes deals with the problem that the classes-of-interests in the limited labeled data is only a subset of the classes in massive unlabeled data. As a result, the classes only possessed by the unlabeled data may mislead the classifier training and thus hindering the realistic landing of various SSL methods. To solve this problem, existing methods usually divide unlabeled data to in-distribution (ID) data and out-of-distribution (OOD) data, and directly discard or weaken the OOD data to avoid their adverse impact. In other words, they treat OOD data as completely useless and thus the potential valuable information for classification contained by them is totally ignored. To remedy this defect, this paper proposes a "Transferable OOD data Recycling" (TOOR) method which properly utilizes ID data as well as the "recyclable" OOD data to enrich the information for conducting class-mismatched SSL. Specifically, TOOR firstly attributes all unlabeled data to ID data or OOD data, among which the ID data are directly used for training. Then we treat the OOD data that have a close relationship with ID data and labeled data as recyclable, and employ adversarial domain adaptation to project them to the space of ID data and labeled data. In other words, the recyclability of an OOD datum is evaluated by its transferability, and the recyclable OOD data are transferred so that they are compatible with the distribution of known classes-of-interests. Consequently, our TOOR method extracts more information from unlabeled data than existing approaches, so it can achieve the improved performance which is demonstrated by the experiments on typical benchmark datasets.

下载PDF全文

下载文献需遵守相关版权规定

论文标题