论文标题
在数据稀缺下重新访问典型的风险最小化,以最大程度地减少多标签分类
Revisiting Vicinal Risk Minimization for Partially Supervised Multi-Label Classification Under Data Scarcity
论文作者
论文摘要
由于注释的人力成本很高,因此策划一个完全标记所有感兴趣类别的大型医疗数据集是不平凡的。取而代之的是,从不同匹配来源收集多个小部分标记的数据集将很方便,在这些数据集中,医疗图像可能仅注释了一部分兴趣类别。本文提供了对未经探索的问题的经验理解,即部分监督的多标签分类(PSMLC),其中多标签分类器仅通过部分标记的医学图像进行培训。与完全监督的对应物相反,由医学数据稀缺引起的部分监督对模型性能产生了非平凡的负面影响。潜在的补救措施可能会增加部分标签。尽管替代风险最小化(VRM)是提高模型概括能力的有前途的解决方案,但其在PSMLC中的应用仍然是一个悬而未决的问题。为了弥合方法论差距,我们为PSMLC提供了第一个基于VRM的解决方案。经验结果还提供了有关数据稀缺下部分监督学习的未来研究方向的见解。
Due to the high human cost of annotation, it is non-trivial to curate a large-scale medical dataset that is fully labeled for all classes of interest. Instead, it would be convenient to collect multiple small partially labeled datasets from different matching sources, where the medical images may have only been annotated for a subset of classes of interest. This paper offers an empirical understanding of an under-explored problem, namely partially supervised multi-label classification (PSMLC), where a multi-label classifier is trained with only partially labeled medical images. In contrast to the fully supervised counterpart, the partial supervision caused by medical data scarcity has non-trivial negative impacts on the model performance. A potential remedy could be augmenting the partial labels. Though vicinal risk minimization (VRM) has been a promising solution to improve the generalization ability of the model, its application to PSMLC remains an open question. To bridge the methodological gap, we provide the first VRM-based solution to PSMLC. The empirical results also provide insights into future research directions on partially supervised learning under data scarcity.