论文标题
使用不同的捐助者接收距离的数据融合,用于连接收入和消费信息
Data Fusion for Joining Income and Consumption Information Using Different Donor-Recipient Distance Metrics
论文作者
论文摘要
数据融合描述了组合来自(至少)两个最初独立数据源的数据的方法,以允许对未共同观察到的变量进行联合分析。基本思想是基于识别假设的推断,以及在所有数据源中共同观察到的信息的通用变量。一个流行的处理此特定缺失数据问题的方法基于最近的邻居匹配。但是,随着共同信息的增加,确切的匹配不太可能,距离函数的规范会影响数据融合的结果。在本文中,我们比较了最近的邻居热甲板匹配的两种不同方法:一个随机热甲板是Eurostat提出的基于协变量的匹配方法的一种变体,可以将其视为一种“经典”统计匹配方法,而另一种方法基于预测的平均匹配。我们讨论了一项仿真研究的结果,以调查这两种变体的收益和潜在缺点,我们的发现表明,预测平均匹配倾向于胜过随机热甲板。
Data fusion describes the method of combining data from (at least) two initially independent data sources to allow for joint analysis of variables which are not jointly observed. The fundamental idea is to base inference on identifying assumptions, and on common variables which provide information that is jointly observed in all the data sources. A popular class of methods dealing with this particular missing-data problem is based on nearest neighbour matching. However, exact matches become unlikely with increasing common information, and the specification of the distance function can influence the results of the data fusion. In this paper we compare two different approaches of nearest neighbour hot deck matching: One, Random Hot Deck, is a variant of the covariate-based matching methods which was proposed by Eurostat, and can be considered as a 'classical' statistical matching method, whereas the alternative approach is based on Predictive Mean Matching. We discuss results from a simulation study to investigate benefits and potential drawbacks of both variants, and our findings suggest that Predictive Mean Matching tends to outperform Random Hot Deck.