论文标题

大规模交叉随机效应回归的回贴

Backfitting for large scale crossed random effects regressions

论文作者

Ghosh, Swarnadip, Hastie, Trevor, Owen, Art B.

论文摘要

具有交叉随机效应错误的回归模型计算可能非常昂贵。对于$ n $观测值,广义最小二乘和吉布斯采样的成本很容易随着$ n^{3/2} $(或更糟)的成本增长。 Papaspiliopoulos等。 (2020)提出了一个倒塌的吉布斯采样器,价格为$ o(n)$,但在非常严格的采样模型下。我们提出了一种反贴合算法来计算广义最小二乘估计,并证明它的价格为$ o(n)$。证据的关键部分是确保所需的迭代次数为$ o(1)$,这是从一定的矩阵规范以下$δ> 0 $的$ 1-δ$以下。与倒塌的吉布斯采样器相比,我们的条件大大放松,尽管仍然严格。从经验上讲,在我们假设中,反贴合算法的规范低于$ 1-δ$。我们说明了Stitch Fix的评分数据集上的新算法。

Regression models with crossed random effect errors can be very expensive to compute. The cost of both generalized least squares and Gibbs sampling can easily grow as $N^{3/2}$ (or worse) for $N$ observations. Papaspiliopoulos et al. (2020) present a collapsed Gibbs sampler that costs $O(N)$, but under an extremely stringent sampling model. We propose a backfitting algorithm to compute a generalized least squares estimate and prove that it costs $O(N)$. A critical part of the proof is in ensuring that the number of iterations required is $O(1)$ which follows from keeping a certain matrix norm below $1-δ$ for some $δ>0$. Our conditions are greatly relaxed compared to those for the collapsed Gibbs sampler, though still strict. Empirically, the backfitting algorithm has a norm below $1-δ$ under conditions that are less strict than those in our assumptions. We illustrate the new algorithm on a ratings data set from Stitch Fix.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源