论文标题

流纵向数据的统计推断

Statistical Inference for Streamed Longitudinal Data

论文作者

Luo, Lan, Wang, Jingshen, Hector, Emily C.

论文摘要

例如,现代纵向数据(例如,从可穿戴设备中使用,都以固定的参与者为单位的时间点测量生物学信号。传统的统计方法不具备处理每次收集新数据时反复分析累积增长数据集的计算负担。我们提出了一个新的估计和推理框架,以动态更新点估计值及其在串行收集的依赖数据集中的标准错误。关键技术是通过累积纵向数据构建的二次推理函数的扩展分数函数的分解,分为数据批次的摘要统计数据的总和。我们展示了如何在无需访问整个数据集的情况下递归更新此和,从而导致计算高效的流媒体过程,而统计效率的损失最小。即使独立参与者的数量保持固定,我们证明流式估计量的一致性和渐近正态性,即使数据批次分散。模拟强调了我们方法比传统统计方法的优势,这些统计方法在数据批次之间具有独立性。最后,我们通过分析来自国家健康和营养检查调查的加速度计分析,研究了体育活动与多种疾病之间的关系。

Modern longitudinal data, for example from wearable devices, measures biological signals on a fixed set of participants at a diverging number of time points. Traditional statistical methods are not equipped to handle the computational burden of repeatedly analyzing the cumulatively growing dataset each time new data is collected. We propose a new estimation and inference framework for dynamic updating of point estimates and their standard errors across serially collected dependent datasets. The key technique is a decomposition of the extended score function of the quadratic inference function constructed over the cumulative longitudinal data into a sum of summary statistics over data batches. We show how this sum can be recursively updated without the need to access the whole dataset, resulting in a computationally efficient streaming procedure with minimal loss of statistical efficiency. We prove consistency and asymptotic normality of our streaming estimator as the number of data batches diverges, even as the number of independent participants remains fixed. Simulations highlight the advantages of our approach over traditional statistical methods that assume independence between data batches. Finally, we investigate the relationship between physical activity and several diseases through the analysis of accelerometry data from the National Health and Nutrition Examination Survey.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源