论文标题

使用正则混合物建模对高维数据的监督聚类

Supervised clustering of high dimensional data using regularized mixture modeling

论文作者

Chang, Wennan, Wan, Changlin, Zang, Yong, Zhang, Chi, Cao, Sha

论文摘要

鉴定分子变异及其临床表现之间的关系受到疾病的异质原因的挑战。必须揭示高维分子表现与临床表现之间的关系,同时考虑到研究对象的可能异质性。我们提出了一种使用惩罚的混合回归模型(称为CSMR)的新型监督聚类算法,以应对研究高维分子特征与表型之间的异质关系时所面临的挑战。该算法是根据分类期望最大化算法改编的,该算法为聚类问题提供了一种新颖的监督解决方案,对计算效率和生物学解释性都有很大的提高。对模拟基准数据集进行的实验评估表明,CSMR可以准确地确定特征子集的子集解释了响应变量,并且表现优于基线方法。 CSMR在药物敏感性数据集上的应用再次证明了CSMR的性能优于其他CSMR,在该数据集中,CSMR在概括隐藏在细胞系中的不同亚组方面,就其应付机制对不同药物的应对机制而言。 CSMR代表了一个大数据分析工具,具有解决将疾病的临床表现为基础的实际原因的复杂性。我们认为,它将为疾病的分子基础带来新的理解,并且在不断增长的个性化医学领域可能具有特殊意义。

Identifying relationships between molecular variations and their clinical presentations has been challenged by the heterogeneous causes of a disease. It is imperative to unveil the relationship between the high dimensional molecular manifestations and the clinical presentations, while taking into account the possible heterogeneity of the study subjects. We proposed a novel supervised clustering algorithm using penalized mixture regression model, called CSMR, to deal with the challenges in studying the heterogeneous relationships between high dimensional molecular features to a phenotype. The algorithm was adapted from the classification expectation maximization algorithm, which offers a novel supervised solution to the clustering problem, with substantial improvement on both the computational efficiency and biological interpretability. Experimental evaluation on simulated benchmark datasets demonstrated that the CSMR can accurately identify the subspaces on which subset of features are explanatory to the response variables, and it outperformed the baseline methods. Application of CSMR on a drug sensitivity dataset again demonstrated the superior performance of CSMR over the others, where CSMR is powerful in recapitulating the distinct subgroups hidden in the pool of cell lines with regards to their coping mechanisms to different drugs. CSMR represents a big data analysis tool with the potential to resolve the complexity of translating the clinical manifestations of the disease to the real causes underpinning it. We believe that it will bring new understanding to the molecular basis of a disease, and could be of special relevance in the growing field of personalized medicine.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源