高遗传力并不意味着在小添加效应假设下进行准确的预测

论文标题

高遗传力并不意味着在小添加效应假设下进行准确的预测

High heritability does not imply accurate prediction under the small additive effects hypothesis

论文作者

Frouin, Arthur, Dandine-Roulland, Claire, Pierre-Jean, Morgane, Deleuze, Jean-François, Ambroise, Christophe, Floch, Edith Le

论文摘要

全基因组关联研究（GWAS）仅解释了大多数复杂人类表型的一小部分遗传力。基因组遗传力估计了通过混合模型在整个基因组上通过SNP解释的方差，并解释了SNP在解释表型中的许多小贡献。本文从机器学习的角度接近遗传力，并研究了混合模型与脊回归之间的紧密联系。我们的贡献是双重的。首先，我们提出使用脊回归和广义交叉验证（GCV）的预测方法估计基因组遗传力。我们表明，这与基于经典混合模型的估计一致。其次，我们得出了简单的公式，该公式表达预测准确性是n/p的函数，其中n是人口大小，p是SNP的总数。这些公式清楚地表明，高遗传力并不意味着当p> n时进行准确的预测。使用英国生物库的模拟数据和实际数据验证了通过GCV对遗传力的估计和预测准确性公式。

Genome-Wide Association Studies (GWAS) explain only a small fraction of heritability for most complex human phenotypes. Genomic heritability estimates the variance explained by the SNPs on the whole genome using mixed models and accounts for the many small contributions of SNPs in the explanation of a phenotype. This paper approaches heritability from a machine learning perspective, and examines the close link between mixed models and ridge regression. Our contribution is twofold. First, we propose estimating genomic heritability using a predictive approach via ridge regression and Generalized Cross Validation (GCV). We show that this is consistent with classical mixed model based estimation. Second, we derive simple formulae that express prediction accuracy as a function of the ratio n/p, where n is the population size and p the total number of SNPs. These formulae clearly show that a high heritability does not imply an accurate prediction when p>n. Both the estimation of heritability via GCV and the prediction accuracy formulae are validated using simulated data and real data from UK Biobank.

下载PDF全文

下载文献需遵守相关版权规定

论文标题