论文标题
具有偏置抽样设计的纵向数据的广义线性模型:顺序偏移回归方法
Generalized Linear Models for Longitudinal Data with Biased Sampling Designs: A Sequential Offsetted Regressions Approach
论文作者
论文摘要
在研究稀有(二进制)或低变异性(连续)终点时,偏置的采样设计效率高。我们考虑纵向数据设置,其中采样的概率取决于通过与结果相关的辅助变量反复测量的响应。这种辅助变量依赖性抽样可改善观察到的响应,并可能在随机抽样中变化,但{尽管}辅助变量也不具有科学意义。 {为了分析,}我们使用两个偏移回归的序列提出了一种基于广义的线性模型方法。第一个估计辅助变量使用抵消逻辑回归模型与响应和协变数据的关系。偏移取决于辅助变量不同值的采样概率的(假定)已知比率。辅助模型的结果用于估计以响应和协变量为条件的观察特异性概率,然后使用这些概率来说明第二个目标种群模型中的偏差。我们提供渐近标准误差,这些误差计算了辅助模型估计中的不确定性,并进行了模拟研究,证明了降低偏差,正确的覆盖率概率以及对简单随机抽样设计的设计效率提高。我们用两个例子说明了这些方法。
Biased sampling designs can be highly efficient when studying rare (binary) or low variability (continuous) endpoints. We consider longitudinal data settings in which the probability of being sampled depends on a repeatedly measured response through an outcome-related, auxiliary variable. Such auxiliary variable- or outcome-dependent sampling improves observed response and possibly exposure variability over random sampling, {even though} the auxiliary variable is not of scientific interest. {For analysis,} we propose a generalized linear model based approach using a sequence of two offsetted regressions. The first estimates the relationship of the auxiliary variable to response and covariate data using an offsetted logistic regression model. The offset hinges on the (assumed) known ratio of sampling probabilities for different values of the auxiliary variable. Results from the auxiliary model are used to estimate observation-specific probabilities of being sampled conditional on the response and covariates, and these probabilities are then used to account for bias in the second, target population model. We provide asymptotic standard errors accounting for uncertainty in the estimation of the auxiliary model, and perform simulation studies demonstrating substantial bias reduction, correct coverage probability, and improved design efficiency over simple random sampling designs. We illustrate the approaches with two examples.