论文标题

高维线性回归的转移学习:预测,估计和最小值最佳性

Transfer Learning for High-dimensional Linear Regression: Prediction, Estimation, and Minimax Optimality

论文作者

Li, Sai, Cai, T. Tony, Li, Hongzhe

论文摘要

本文使用来自目标模型的样本以及来自不同但可能相关的回归模型的辅助样本的样本,考虑了转移学习环境中高维线性回归的估计和预测。当已知“信息性”辅助样品集时,提出了一个估计器和预测因子并确定其最佳性。无需使用辅助样品而无需使用辅助样品,预测和估计的最佳收敛速率比相应的速率快。这意味着可以转移信息丰富的辅助样本的知识以提高目标问题的学习绩效。如果一组信息丰富的辅助样品是未知的,我们提出了一个数据驱动的传输学习程序,称为Trans-Lasso,并揭示了其对非信息性辅助样本的鲁棒性及其在知识转移方面的效率。所提出的程序在数值研究中证明,并应用于有关基因表达之间关联的数据集。结果表明,通过将来自多个不同组织的数据作为辅助样品纳入基因组织中基因表达预测的性能提高了基因表达预测的性能。

This paper considers the estimation and prediction of a high-dimensional linear regression in the setting of transfer learning, using samples from the target model as well as auxiliary samples from different but possibly related regression models. When the set of "informative" auxiliary samples is known, an estimator and a predictor are proposed and their optimality is established. The optimal rates of convergence for prediction and estimation are faster than the corresponding rates without using the auxiliary samples. This implies that knowledge from the informative auxiliary samples can be transferred to improve the learning performance of the target problem. In the case that the set of informative auxiliary samples is unknown, we propose a data-driven procedure for transfer learning, called Trans-Lasso, and reveal its robustness to non-informative auxiliary samples and its efficiency in knowledge transfer. The proposed procedures are demonstrated in numerical studies and are applied to a dataset concerning the associations among gene expressions. It is shown that Trans-Lasso leads to improved performance in gene expression prediction in a target tissue by incorporating the data from multiple different tissues as auxiliary samples.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源