论文标题
平方根 - 拉索和相关估计器的样本外预测误差
The out-of-sample prediction error of the square-root-LASSO and related estimators
论文作者
论文摘要
我们研究了使用$ d $维二维协变量矢量的线性组合预测结果变量的经典问题,$ \ mathbf {x} $。我们对系数求解的线性预测变量感兴趣:%\ begin {align*} \ inf _ {\boldsymbolβ\ in \ mathbb {r}^d} \ left(\ mathbb {e} _ {左(y- \ Mathbf {x}^{\ top}β\ right)^r \右] \ right] \ right) \ mathbb {r} _+$是凸惩罚函数,$ \ mathbb {p} _n $是数据的经验分布,$ r \ geq 1 $。我们提出三组新结果。首先,我们提供条件下,基于这些估计值%的线性预测因子求解了\ emph {分布强劲的优化}问题:它们最大程度地减少了以\ emph {max-sinced assent asserstErstein metric}类型的分布而最小化最差的预测误差。其次,我们对分析最坏情况的预测误差的分布球的统计特性提供了详细的有限样本和渐近分析。第三,我们使用分布强大的最优性和统计分析来介绍i)选择正则化参数$δ$的Oracle建议,以确保良好的样本外预测错误; ii)一种测试统计量,用于对两个不同线性估计器的样本外性能进行排名。我们的结果都不依赖于有关真实数据生成过程的稀疏假设。因此,他们扩大了方形拉索和相关估计量在预测问题中的使用范围。
We study the classical problem of predicting an outcome variable, $Y$, using a linear combination of a $d$-dimensional covariate vector, $\mathbf{X}$. We are interested in linear predictors whose coefficients solve: % \begin{align*} \inf_{\boldsymbolβ \in \mathbb{R}^d} \left( \mathbb{E}_{\mathbb{P}_n} \left[ \left(Y-\mathbf{X}^{\top}β\right)^r \right] \right)^{1/r} +δ\, ρ\left(\boldsymbolβ\right), \end{align*} where $δ>0$ is a regularization parameter, $ρ:\mathbb{R}^d\to \mathbb{R}_+$ is a convex penalty function, $\mathbb{P}_n$ is the empirical distribution of the data, and $r\geq 1$. We present three sets of new results. First, we provide conditions under which linear predictors based on these estimators % solve a \emph{distributionally robust optimization} problem: they minimize the worst-case prediction error over distributions that are close to each other in a type of \emph{max-sliced Wasserstein metric}. Second, we provide a detailed finite-sample and asymptotic analysis of the statistical properties of the balls of distributions over which the worst-case prediction error is analyzed. Third, we use the distributionally robust optimality and our statistical analysis to present i) an oracle recommendation for the choice of regularization parameter, $δ$, that guarantees good out-of-sample prediction error; and ii) a test-statistic to rank the out-of-sample performance of two different linear estimators. None of our results rely on sparsity assumptions about the true data generating process; thus, they broaden the scope of use of the square-root lasso and related estimators in prediction problems.