论文标题
通过未标记的离线数据增强因果估计
Enhancing Causal Estimation through Unlabeled Offline Data
论文作者
论文摘要
考虑一个新患者到达重症监护病房(ICU)并由多个传感器监视的情况。我们希望评估对患者诊断和治疗具有很强影响的相关的未衡量的生理变量(例如心脏收缩性,产量和血管耐药性)。我们没有有关该特定患者的任何信息,但是,有关以前患者的广泛离线信息,这些信息可能仅与当前患者有部分相关(数据集偏移案例)。这些信息构成了我们的先验知识,并且既部分又近似。基本问题是如何最好地使用此先验知识,并结合在线患者数据,以帮助最有效地诊断当前患者。我们提出的方法包括三个阶段:(i)使用丰富的离线数据来为相关未衡量的生理变量同时创建非c-和因果估计量。 (ii)基于构建的非毒物估计量,以及一组新患者的测量值,我们构建了一个因果过滤器,该因果过滤器可在预测这组新患者的隐藏生理变量方面提供更高的准确性。 (iii)对于任何到达ICU的新患者,我们使用构造过滤器来预测相关的内部变量。总体而言,这种策略使我们能够利用大量可用的离线数据,以增强新来的患者的因果估计。在离线数据仅与新观察结果部分相关的情况下,我们证明了该方法对(非医学)现实世界任务的有效性。我们在Kalman过滤和平滑的线性设置中提供了方法的数学分析,以证明其效用。
Consider a situation where a new patient arrives in the Intensive Care Unit (ICU) and is monitored by multiple sensors. We wish to assess relevant unmeasured physiological variables (e.g., cardiac contractility and output and vascular resistance) that have a strong effect on the patients diagnosis and treatment. We do not have any information about this specific patient, but, extensive offline information is available about previous patients, that may only be partially related to the present patient (a case of dataset shift). This information constitutes our prior knowledge, and is both partial and approximate. The basic question is how to best use this prior knowledge, combined with online patient data, to assist in diagnosing the current patient most effectively. Our proposed approach consists of three stages: (i) Use the abundant offline data in order to create both a non-causal and a causal estimator for the relevant unmeasured physiological variables. (ii) Based on the non-causal estimator constructed, and a set of measurements from a new group of patients, we construct a causal filter that provides higher accuracy in the prediction of the hidden physiological variables for this new set of patients. (iii) For any new patient arriving in the ICU, we use the constructed filter in order to predict relevant internal variables. Overall, this strategy allows us to make use of the abundantly available offline data in order to enhance causal estimation for newly arriving patients. We demonstrate the effectiveness of this methodology on a (non-medical) real-world task, in situations where the offline data is only partially related to the new observations. We provide a mathematical analysis of the merits of the approach in a linear setting of Kalman filtering and smoothing, demonstrating its utility.