对层次多标签分类的决策，具有多维本地精度率

论文标题

对层次多标签分类的决策，具有多维本地精度率

Decision Making for Hierarchical Multi-label Classification with Multidimensional Local Precision Rate

论文作者

Ye, Yuting, Ho, Christine, Jiang, Ci-Ren, Lee, Wayne Tai, Huang, Haiyan

论文摘要

在过去的几十年中，分层多标签分类（HMC）引起了人们的关注。当可以使用类别之间的层次关系，并且需要与多标签分类合并，将每个对象分配给一个或多个类时，则适用。 HMC中有两个主要挑战：i）优化分类精度，同时ii）确保给定的类层次结构。为了解决这些挑战，在本文中，我们介绍了一个新的统计量，称为每个类别中每个对象的多维本地精度率（MLPR）。我们表明，通过简单地按跨类分类对象的分类决策，从理论上讲，可以确保类层次结构并导致捕获量的最大化，这是我们引入的目标函数，我们引入了与命中曲线下的区域相关的目标函数。这种方法是第一个在一个目标函数中处理两个挑战而没有其他限制的同类方法，这要归功于Catch和MLPR的理想统计属性。但是，实际上，真正的MLPR不可用。作为回应，我们介绍了Hierrank，这是一种新算法，在尊重层次结构的同时，使用估计的MLPR最大化捕获的经验版本。在合成数据集和两个真实数据集上评估了这种方法的性能；发现我们的基于精度，召回和$ f_1 $得分等指标的评估标准的几种比较方法优越。

Hierarchical multi-label classification (HMC) has drawn increasing attention in the past few decades. It is applicable when hierarchical relationships among classes are available and need to be incorporated along with the multi-label classification whereby each object is assigned to one or more classes. There are two key challenges in HMC: i) optimizing the classification accuracy, and meanwhile ii) ensuring the given class hierarchy. To address these challenges, in this article, we introduce a new statistic called the multidimensional local precision rate (mLPR) for each object in each class. We show that classification decisions made by simply sorting objects across classes in descending order of their true mLPRs can, in theory, ensure the class hierarchy and lead to the maximization of CATCH, an objective function we introduce that is related to the area under a hit curve. This approach is the first of its kind that handles both challenges in one objective function without additional constraints, thanks to the desirable statistical properties of CATCH and mLPR. In practice, however, true mLPRs are not available. In response, we introduce HierRank, a new algorithm that maximizes an empirical version of CATCH using estimated mLPRs while respecting the hierarchy. The performance of this approach was evaluated on a synthetic data set and two real data sets; ours was found to be superior to several comparison methods on evaluation criteria based on metrics such as precision, recall, and $F_1$ score.

下载PDF全文

下载文献需遵守相关版权规定

论文标题