论文标题
带有足够统计的因果学习:信息瓶颈方法
Causal learning with sufficient statistics: an information bottleneck approach
论文作者
论文摘要
使用隐藏变量的部分观察到的多变量系统的观察数据来推论因果关系是许多科学领域的基本问题。从系统的变量之间的条件独立性中提取因果信息的方法是为此目的的常见工具,但缺乏独立性。为了克服这一限制,我们利用了这样一个事实,即管理系统生成机制的法律通常会导致变量的生成功能方程中体现的子结构,这充当了其他变量对其影响的影响的足够统计数据。这些功能足够的统计数据构成了中间的隐藏变量,提供了要测试的新条件独立性。我们建议使用信息瓶颈方法,这是一种通常用于降低维度的技术,以找到基本的足够的统计数据集。使用这些统计数据,我们制定了新的因果方向的新规则,这些规则提供了可因果信息从标准结构学习算法中获得的因果信息,该算法仅利用可观察到的变量之间仅利用条件独立性。我们验证了使用足够的统计数据用于结构学习,同时构建了构建的模拟系统,以包含特定的足够统计数据和先前来自监管规则的基准数据,并独立提出,以模拟生物信号转导网络。
The inference of causal relationships using observational data from partially observed multivariate systems with hidden variables is a fundamental question in many scientific domains. Methods extracting causal information from conditional independencies between variables of a system are common tools for this purpose, but are limited in the lack of independencies. To surmount this limitation, we capitalize on the fact that the laws governing the generative mechanisms of a system often result in substructures embodied in the generative functional equation of a variable, which act as sufficient statistics for the influence that other variables have on it. These functional sufficient statistics constitute intermediate hidden variables providing new conditional independencies to be tested. We propose to use the Information Bottleneck method, a technique commonly applied for dimensionality reduction, to find underlying sufficient sets of statistics. Using these statistics we formulate new additional rules of causal orientation that provide causal information not obtainable from standard structure learning algorithms, which exploit only conditional independencies between observable variables. We validate the use of sufficient statistics for structure learning both with simulated systems built to contain specific sufficient statistics and with benchmark data from regulatory rules previously and independently proposed to model biological signal transduction networks.