潜在因素诱导的节点簇之间的因果结构发现

论文标题

潜在因素诱导的节点簇之间的因果结构发现

Causal Structure Discovery between Clusters of Nodes Induced by Latent Factors

论文作者

Squires, Chandler, Yun, Annie, Nichani, Eshaan, Agrawal, Raj, Uhler, Caroline

论文摘要

我们考虑了在存在潜在变量的情况下学习因果定向无环（DAG）模型的结构的问题。我们将潜在因子因果模型（LFCMS）定义为对具有潜在变量的因果DAG模型的限制，这些因子DAG模型由观察到的变量组成的群集组成，这些变量共享相同的潜在母体和这些群集之间的连接，这些群集由指向潜在变量指向潜在变量的边缘指向。 LFCM是由基因调节网络激励的，在该网络中，与转录因子相对应的调节边缘连接了空间聚集的基因。我们在该模型上显示了可识别性结果，并设计了一种一致的三阶段算法，该算法发现了观察到的节点的簇，簇上的部分排序，最后是观察到的和潜在的节点上的整个结构。我们在合成环境中评估我们的方法，证明了其几乎可以在相对较低的样本量下几乎完美恢复地面真相聚类的能力，以及能够从观察到的变量到潜在因素恢复大量边缘的能力。最后，我们将我们的方法应用于半合成环境中，以具有已知地面真实网络的蛋白质质谱数据，并几乎完美地恢复了地面真实变量群集。

We consider the problem of learning the structure of a causal directed acyclic graph (DAG) model in the presence of latent variables. We define latent factor causal models (LFCMs) as a restriction on causal DAG models with latent variables, which are composed of clusters of observed variables that share the same latent parent and connections between these clusters given by edges pointing from the observed variables to latent variables. LFCMs are motivated by gene regulatory networks, where regulatory edges, corresponding to transcription factors, connect spatially clustered genes. We show identifiability results on this model and design a consistent three-stage algorithm that discovers clusters of observed nodes, a partial ordering over clusters, and finally, the entire structure over both observed and latent nodes. We evaluate our method in a synthetic setting, demonstrating its ability to almost perfectly recover the ground truth clustering even at relatively low sample sizes, as well as the ability to recover a significant number of the edges from observed variables to latent factors. Finally, we apply our method in a semi-synthetic setting to protein mass spectrometry data with a known ground truth network, and achieve almost perfect recovery of the ground truth variable clusters.

下载PDF全文

下载文献需遵守相关版权规定

论文标题