使用基因共表达网络上的社区检测算法来预测酿酒酵母细胞周期调节基因的生物学分类

论文标题

使用基因共表达网络上的社区检测算法来预测酿酒酵母细胞周期调节基因的生物学分类

Predicting the Biological Classification of Cell-Cycle Regulated Genes of Saccharomyces cerevisiae using Community Detection Algorithms on Gene Co-expression Networks

论文作者

Clemente, Jhoirene B., Besas, Gabriel, Callado, Jerick, Evangelista, John Erol

论文摘要

分析基因表达数据的常规方法涉及聚类算法。聚类分析提供了一组基因的分区，这些基因可以根据其在N维空间中的相似性来预测生物学分类。在这项研究中，我们调查网络分析是否会比传统方法提供优势。我们确定使用基于价值的构造和基于等级的构造的优势和缺点，以创建以时间序列格式的原始基因表达数据的图表表示。我们测试了四种社区检测算法，分别是Clauset-Newman-Moore（Greedy），Louvain，Leiden和Girvan-Newman算法，以预测5个基因函数基因组。我们使用调整后的RAND指数来评估有关生物分类的预测社区的质量。我们表明，Girvan-Newman在基于值的基于值和排名的构造图中均优于基于3个模块化的算法。此外，我们还表明，与常规的聚类算法（例如K-均值，光谱，桦木和聚集算法）相比，我们与Girvan-Newman获得了更高的ARI。这项研究还提供了图形结构，可视化和社区检测的工具，以进一步分析基因表达数据。

The conventional approach for analyzing gene expression data involves clustering algorithms. Cluster analyses provide partitioning of the set of genes that can predict biological classification based on its similarity in n-dimensional space. In this study, we investigate whether network analysis will provide an advantage over the traditional approach. We identify the advantages and disadvantages of using the value-based and the rank-based construction in creating a graph representation of the original gene-expression data in a time-series format. We tested four community detection algorithms, namely, the Clauset-Newman-Moore (greedy), Louvain, Leiden, and Girvan-Newman algorithms in predicting the 5 functional groups of genes. We used the Adjusted Rand Index to assess the quality of the predicted communities with respect to the biological classifications. We showed that Girvan-Newman outperforms the 3 modularity-based algorithms in both value-based and ranked-based constructed graphs. Moreover, we also show that when compared to the conventional clustering algorithms such as K-means, Spectral, Birch, and Agglomerative algorithms, we obtained a higher ARI with Girvan-Newman. This study also provides a tool for graph construction, visualization, and community detection for further analysis of gene expression data.

下载PDF全文

下载文献需遵守相关版权规定

论文标题