论文标题
多组癌数据综合分析的聚类方法
A Clustering Approach to Integrative Analysis of Multiomic Cancer Data
论文作者
论文摘要
快速的技术进步允许从单个样本中跨多个OMIC域进行分子分析,以在许多疾病(尤其是癌症)中进行临床决策。由于肿瘤发展和进展是涉及复合基因组畸变的动态生物学过程,关键挑战是有效地吸收这些领域的信息,以识别可吸毒的基因组信号和生物学实体,为未来患者提供准确的风险预测概况,并为未来的患者开发新的患者,以进行新颖的患者子组来量身定制治疗和监测。 我们提出了用于高维多域癌症数据的综合概率框架,该框架连贯地纳入域内和域之间的依赖性,以准确检测肿瘤亚型,从而提供了与癌症分类学相关的基因组畸变目录。我们提出了一个创新,灵活和可扩展的贝叶斯非参数框架,用于同时聚类肿瘤样品和基因组探针。我们描述了一种有效的可变选择程序,以识别可能揭示疾病潜在驱动因素的相关基因组畸变。尽管这项工作是由与肺癌有关的几项研究激发的,但所提出的方法广泛适用于涉及高维数据的各种情况。使用人工数据和肺癌的曲目概况证明了该方法的成功。
Rapid technological advances have allowed for molecular profiling across multiple omics domains from a single sample for clinical decision making in many diseases, especially cancer. As tumor development and progression are dynamic biological processes involving composite genomic aberrations, key challenges are to effectively assimilate information from these domains to identify genomic signatures and biological entities that are druggable, develop accurate risk prediction profiles for future patients, and identify novel patient subgroups for tailored therapy and monitoring. We propose integrative probabilistic frameworks for high-dimensional multiple-domain cancer data that coherently incorporate dependence within and between domains to accurately detect tumor subtypes, thus providing a catalogue of genomic aberrations associated with cancer taxonomy. We propose an innovative, flexible and scalable Bayesian nonparametric framework for simultaneous clustering of both tumor samples and genomic probes. We describe an efficient variable selection procedure to identify relevant genomic aberrations that can potentially reveal underlying drivers of a disease. Although the work is motivated by several investigations related to lung cancer, the proposed methods are broadly applicable in a variety of contexts involving high-dimensional data. The success of the methodology is demonstrated using artificial data and lung cancer omics profiles publicly available from The Cancer Genome Atlas.