论文标题

用Hotelling的T2统计量和Q残基打开邻居嵌入的黑色框

Opening the black-box of Neighbor Embedding with Hotelling's T2 statistic and Q-residuals

论文作者

Rainer, Roman Josef, Mayr, Michael, Himmelbauer, Johannes, Nikzad-Langerodi, Ramin

论文摘要

与高维数据集的探索性分析(例如主成分分析(PCA))相反,邻居嵌入(NE)技术倾向于更好地保留高维数据的局部结构/拓扑。但是,保留本地结构的能力是以解释性为代价的:诸如T-分配的随机邻居嵌入(T-SNE)或统一的歧管近似和投影(UMAP)等技术没有提供有关在相应嵌入中看到拓扑(群集)结构的输入变量的见解。在这里,我们提出了基于PCA,Q-残基和Hotelling的T2贡献的化学计量学领域的不同“技巧”,并结合了新型可视化方法,从而得出了邻居嵌入的局部和全局解释。我们展示了我们的方法如何使用标准的单变量或多变量方法来识别数据点组之间的歧视性特征。

In contrast to classical techniques for exploratory analysis of high-dimensional data sets, such as principal component analysis (PCA), neighbor embedding (NE) techniques tend to better preserve the local structure/topology of high-dimensional data. However, the ability to preserve local structure comes at the expense of interpretability: Techniques such as t-Distributed Stochastic Neighbor Embedding (t-SNE) or Uniform Manifold Approximation and Projection (UMAP) do not give insights into which input variables underlie the topological (cluster) structure seen in the corresponding embedding. We here propose different "tricks" from the chemometrics field based on PCA, Q-residuals and Hotelling's T2 contributions in combination with novel visualization approaches to derive local and global explanations of neighbor embedding. We show how our approach is capable of identifying discriminatory features between groups of data points that remain unnoticed when exploring NEs using standard univariate or multivariate approaches.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源