论文标题
用于标准化协方差的C档算法
The C-SHIFT algorithm for normalizing covariances
论文作者
论文摘要
OMICS技术是分析数千个基因基因表达数据中模式的强大工具。由于实验中的许多系统变化,原始基因表达数据通常会被不良的技术噪声混淆。设计了各种归一化技术,以试图在进行任何统计分析之前去除这些非生物学错误。正常数据正常化的原因之一是需要恢复基因网络分析中使用的协方差矩阵。在本文中,我们介绍了一种新型的归一化技术,称为协方差转移(C-Shift)方法。该归一化算法使用优化技术,以及维度哲学的祝福和能量最小化假设,用于在添加噪声下(在生物学中,称为偏见)下的协方差矩阵恢复。因此,它非常适合分析对数基因表达数据。关于合成数据的数值实验证明了该方法比经典归一化技术的优势。也就是说,比较是通过等级,分位数,循环黄土(局部估计的散点图平滑)和疯狂(中值绝对偏差)归一化方法进行的。我们还评估了在实际生物学数据上的C转移算法的性能。
Omics technologies are powerful tools for analyzing patterns in gene expression data for thousands of genes. Due to a number of systematic variations in experiments, the raw gene expression data is often obfuscated by undesirable technical noises. Various normalization techniques were designed in an attempt to remove these non-biological errors prior to any statistical analysis. One of the reasons for normalizing data is the need for recovering the covariance matrix used in gene network analysis. In this paper, we introduce a novel normalization technique, called the covariance shift (C-SHIFT) method. This normalization algorithm uses optimization techniques together with the blessing of dimensionality philosophy and energy minimization hypothesis for covariance matrix recovery under additive noise (in biology, known as the bias). Thus, it is perfectly suited for the analysis of logarithmic gene expression data. Numerical experiments on synthetic data demonstrate the method's advantage over the classical normalization techniques. Namely, the comparison is made with Rank, Quantile, cyclic LOESS (locally estimated scatterplot smoothing), and MAD (median absolute deviation) normalization methods. We also evaluate the performance of C-SHIFT algorithm on real biological data.