论文标题
转动信息共享拨盘:来自不同数据源的有效推断
Turning the information-sharing dial: efficient inference from different data sources
论文作者
论文摘要
统计的一个基本方面是来自不同来源的数据集成。从经典上讲,费舍尔和其他人专注于如何整合一组均质(或仅是异质)数据集。最近,随着数据变得越来越易于访问,是否应该集成了来自不同来源的数据集的问题变得越来越重要。当前的文献将其视为一个问题,只有两个答案:整合或不整合。在这里,我们采用了不同的方法,这是由收缩估计文献带来的信息共享原则的动机。特别是,我们从do/not的角度偏离,并提出一个拨号参数,以控制两个数据源的集成程度。该拨号参数应显示多远取决于例如,根据Fisher信息衡量的不同数据源的信息。在广义线性模型的背景下,这种更细微的数据集成框架会导致相对简单的参数估计和有效的测试/置信区间。此外,我们从理论和经验上都证明了与其他二进制数据集成方案相比,根据我们的建议设置拨号参数会导致更有效的估计。
A fundamental aspect of statistics is the integration of data from different sources. Classically, Fisher and others were focused on how to integrate homogeneous (or only mildly heterogeneous) sets of data. More recently, as data are becoming more accessible, the question of if data sets from different sources should be integrated is becoming more relevant. The current literature treats this as a question with only two answers: integrate or don't. Here we take a different approach, motivated by information-sharing principles coming from the shrinkage estimation literature. In particular, we deviate from the do/don't perspective and propose a dial parameter that controls the extent to which two data sources are integrated. How far this dial parameter should be turned is shown to depend, for example, on the informativeness of the different data sources as measured by Fisher information. In the context of generalized linear models, this more nuanced data integration framework leads to relatively simple parameter estimates and valid tests/confidence intervals. Moreover, we demonstrate both theoretically and empirically that setting the dial parameter according to our recommendation leads to more efficient estimation compared to other binary data integration schemes.