Azadkia＆Chatterjee与多响应向量的等级相关的直接扩展

论文标题

Azadkia＆Chatterjee与多响应向量的等级相关的直接扩展

A direct extension of Azadkia & Chatterjee's rank correlation to multi-response vectors

论文作者

Ansari, Jonathan, Fuchs, Sebastian

论文摘要

最近，查特吉（Chatterjee，2023年）认识到在阿扎德基（Azadkia）和查特吉（Chatterjee）（2021）（2021年）中缺乏直接概括其对多维响应向量的$ξ$。作为解决此问题的一种自然解决方案，我们在这里提出了适用于一组$ Q \ geq 1 $响应变量的$ξ$的扩展，我们的方法在将原始矢量价值问题转换为单变量问题然后将等级相关性$ξ$应用于其上。我们的小说度量$ t $量化了响应向量$ \ mathbf {y} =（y_1，\ dots，y_q）$的功能依赖的规模不变程度，$ \ mathbf {x} =（x_1，x_1，\ dots，x_p）$以及$ \ mathbf {y} $在$ \ mathbf {x} $上的完美依赖性，因此实现了可预测性度量的所有特征。为了最大程度的解释性，我们为$ t $提供各种不变性结果，以及多元正常模型中的封闭式表达。在Azadkia和Chatterjee（2021）的基于图的估算器的基础上，我们获得了一个非参数，非常一致的估计器，$ t $，并显示（作为主要贡献）其渐近态性。基于此估计器，我们为多种结果数据开发了一个基于模型和等级的功能排名和正向特征选择，该数据无需任何调整参数而起作用。仿真结果和实际案例研究说明了$ t $的广泛适用性。

Recently, Chatterjee (2023) recognized the lack of a direct generalization of his rank correlation $ξ$ in Azadkia and Chatterjee (2021) to a multi-dimensional response vector. As a natural solution to this problem, we here propose an extension of $ξ$ that is applicable to a set of $q \geq 1$ response variables, where our approach builds upon converting the original vector-valued problem into a univariate problem and then applying the rank correlation $ξ$ to it. Our novel measure $T$ quantifies the scale-invariant extent of functional dependence of a response vector $\mathbf{Y} = (Y_1,\dots,Y_q)$ on predictor variables $\mathbf{X} = (X_1, \dots,X_p)$, characterizes independence of $\mathbf{X}$ and $\mathbf{Y}$ as well as perfect dependence of $\mathbf{Y}$ on $\mathbf{X}$ and hence fulfills all the characteristics of a measure of predictability. Aiming at maximum interpretability, we provide various invariance results for $T$ as well as a closed-form expression in multivariate normal models. Building upon the graph-based estimator for $ξ$ in Azadkia and Chatterjee (2021), we obtain a non-parametric, strongly consistent estimator for $T$ and show -- as a main contribution -- its asymptotic normality. Based on this estimator, we develop a model-free and rank-based feature ranking and forward feature selection for multiple-outcome data that works without any tuning parameters. Simulation results and real case studies illustrate $T$'s broad applicability.

下载PDF全文

下载文献需遵守相关版权规定

论文标题