控制错误发现率的特征选择方法

论文标题

控制错误发现率的特征选择方法

A Feature Selection Method that Controls the False Discovery Rate

论文作者

Rostami, Mehdi, Saarela, Olli

论文摘要

在监督机器学习算法中选择少数真正相关变量的问题在不可测试的假设方面是一个具有挑战性的问题，这些假设必须保持和不可用的理论保证，即选择错误正在控制。我们提出了一种无分配特征选择方法，称为数据拆分选择（DSS），该方法在获得高功率的同时控制特征选择的错误发现率（FDR）。提出了另一个版本的DSS，其功率更高，“几乎”控制了FDR。没有对响应分布或特征的联合分布的假设。进行了广泛的仿真，以将提出方法的性能与现有方法进行比较。

The problem of selecting a handful of truly relevant variables in supervised machine learning algorithms is a challenging problem in terms of untestable assumptions that must hold and unavailability of theoretical assurances that selection errors are under control. We propose a distribution-free feature selection method, referred to as Data Splitting Selection (DSS) which controls False Discovery Rate (FDR) of feature selection while obtaining a high power. Another version of DSS is proposed with a higher power which "almost" controls FDR. No assumptions are made on the distribution of the response or on the joint distribution of the features. Extensive simulation is performed to compare the performance of the proposed methods with the existing ones.

下载PDF全文

下载文献需遵守相关版权规定

论文标题