论文标题
通过$ L_ {2,0} $选择强大的多级功能选择 - 规范正规化最小化
Robust Multi-class Feature Selection via $l_{2,0}$-Norm Regularization Minimization
论文作者
论文摘要
功能选择是数据挖掘和机器学习中的重要数据,它可以降低特征大小而不会导致模型的性能。最近,基于回归的特征选择方法由于其良好的性能而受到了广泛的关注。但是,由于$ l_ {2,0} $ - 规范正规化术语是非convex,因此很难解决此问题。在本文中,与大多数仅解决近似问题的其他方法不同,提出了一种基于同型迭代硬阈值(HIHT)的新方法来解决$ l_ {2,0} $ - 直接用于多级特征选择的最小正规化方形问题,用于多级特征选择,这可以为重量矩阵产生精确的行sparsity解决方案。为了减少HIHT的计算时间,更多的是HIHT(AHIHT)的加速版。在八个生物数据集上进行的广泛实验表明,所提出的方法可以实现更高的分类精度(ACC),而所选特征数量最少(No.FEA)与近似凸的凸面和最新的特征选择方法进行了比较。还展示了分类准确性的鲁棒性和正则化参数的鲁棒性和所选特征的数量。
Feature selection is an important data pre-processing in data mining and machine learning, which can reduce feature size without deteriorating model's performance. Recently, sparse regression based feature selection methods have received considerable attention due to their good performance. However, because the $l_{2,0}$-norm regularization term is non-convex, this problem is very hard to solve. In this paper, unlike most of the other methods which only solve the approximate problem, a novel method based on homotopy iterative hard threshold (HIHT) is proposed to solve the $l_{2,0}$-norm regularization least square problem directly for multi-class feature selection, which can produce exact row-sparsity solution for the weights matrix. What'more, in order to reduce the computational time of HIHT, an acceleration version of HIHT (AHIHT) is derived. Extensive experiments on eight biological datasets show that the proposed method can achieve higher classification accuracy (ACC) with fewest number of selected features (No.fea) comparing with the approximate convex counterparts and state-of-the-art feature selection methods. The robustness of classification accuracy to the regularization parameter and the number of selected feature are also exhibited.