Avida：可视化和集成数据的交替方法

论文标题

Avida：可视化和集成数据的交替方法

AVIDA: Alternating method for Visualizing and Integrating Data

论文作者

Dover, Kathryn, Cang, Zixuan, Ma, Anna, Nie, Qing, Vershynin, Roman

论文摘要

高维多模式数据在许多科学领域都产生。当样本和不同数据集的特征之间没有已知的对应关系时，多模式数据的集成变得具有挑战性。为了应对这一挑战，我们介绍了Avida，这是同时执行数据一致性和降低尺寸的框架。在数值实验中，Gromov-Wasserstein最佳转运和T分布的随机邻居嵌入分别用作比对和尺寸还原模块。我们表明，AVIDA与四个合成数据集和两个真实的多模式单电池数据集正确对齐高维数据集。与几种现有方法相比，我们证明了Avida可以更好地保留单个数据集的结构，尤其是关节低维可视化中的不同局部结构，同时实现了可比的对齐性能。这种属性在多模式单细胞数据分析中很重要，因为其中一个数据集唯一捕获了某些生物学过程。在一般应用中，其他方法可用于比对和降低模块。

High-dimensional multimodal data arises in many scientific fields. The integration of multimodal data becomes challenging when there is no known correspondence between the samples and the features of different datasets. To tackle this challenge, we introduce AVIDA, a framework for simultaneously performing data alignment and dimension reduction. In the numerical experiments, Gromov-Wasserstein optimal transport and t-distributed stochastic neighbor embedding are used as the alignment and dimension reduction modules respectively. We show that AVIDA correctly aligns high-dimensional datasets without common features with four synthesized datasets and two real multimodal single-cell datasets. Compared to several existing methods, we demonstrate that AVIDA better preserves structures of individual datasets, especially distinct local structures in the joint low-dimensional visualization, while achieving comparable alignment performance. Such a property is important in multimodal single-cell data analysis as some biological processes are uniquely captured by one of the datasets. In general applications, other methods can be used for the alignment and dimension reduction modules.

下载PDF全文

下载文献需遵守相关版权规定

论文标题