论文标题

样品生长的时空果实模式

Spatio-Causal Patterns of Sample Growth

论文作者

Ribeiro, Andre F.

论文摘要

不同的统计样本(例如,来自不同位置的)提供了具有不同统计特性的人群和学习系统观察。 (1)“未满面的“生长保留系统”下的样本能够确定其各个变量对任何利益结果的独立影响(因此,对公平且可解释的黑盒预测)的独立影响。 (2)“外部valid”生长下的样本保留了其做出跨样品变化的预测的能力。第一个促进了概括人口的预测,第二个是其共同的不受控制的因素。我们在1840年至1940年的美国人口普查中说明了这些理论模式,从街头一直到国民的样本等等。这揭示了样本要求在空间和时间上的普遍性,以及沙普利价值,反事实统计和双曲线几何形状之间的新连接。

Different statistical samples (e.g., from different locations) offer populations and learning systems observations with distinct statistical properties. Samples under (1) 'Unconfounded' growth preserve systems' ability to determine the independent effects of their individual variables on any outcome-of-interest (and lead, therefore, to fair and interpretable black-box predictions). Samples under (2) 'Externally-Valid' growth preserve their ability to make predictions that generalize across out-of-sample variation. The first promotes predictions that generalize over populations, the second over their shared uncontrolled factors. We illustrate these theoretic patterns in the full American census from 1840 to 1940, and samples ranging from the street-level all the way to the national. This reveals sample requirements for generalizability over space and time, and new connections among the Shapley value, counterfactual statistics, and hyperbolic geometry.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源