论文标题

IVFS:高维拓扑保存的简单有效的功能选择

IVFS: Simple and Efficient Feature Selection for High Dimensional Topology Preservation

论文作者

Li, Xiaoyun, Wu, Chengxi, Li, Ping

论文摘要

特征选择是处理高维数据的重要工具。在无监督的情况下,许多流行的算法旨在维护原始数据的结构。在本文中,我们提出了一种简单有效的特征选择算法,以通过新的透视图,拓扑保存来增强样本相似性保存,该拓扑保存由计算拓扑环境中的持续图表示。此方法是在称为IVF的统一特征选择框架上设计的,该框架的灵感来自随机子集方法。该方案是灵活的,可以处理该问题在分析上棘手的情况。所提出的算法能够很好地保留完整数据的成对距离以及拓扑模式。我们证明,我们的算法可以在尖锐的子采样率下提供令人满意的性能,从而有效地实现了我们提出的大规模数据集的方法。广泛的实验验证了提出的特征选择方案的有效性。

Feature selection is an important tool to deal with high dimensional data. In unsupervised case, many popular algorithms aim at maintaining the structure of the original data. In this paper, we propose a simple and effective feature selection algorithm to enhance sample similarity preservation through a new perspective, topology preservation, which is represented by persistent diagrams from the context of computational topology. This method is designed upon a unified feature selection framework called IVFS, which is inspired by random subset method. The scheme is flexible and can handle cases where the problem is analytically intractable. The proposed algorithm is able to well preserve the pairwise distances, as well as topological patterns, of the full data. We demonstrate that our algorithm can provide satisfactory performance under a sharp sub-sampling rate, which supports efficient implementation of our proposed method to large scale datasets. Extensive experiments validate the effectiveness of the proposed feature selection scheme.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源