使用可解释的机器学习，以大大增加研究的抗体病毒相互作用的数量

论文标题

使用可解释的机器学习，以大大增加研究的抗体病毒相互作用的数量

Using Interpretable Machine Learning to Massively Increase the Number of Antibody-Virus Interactions Across Studies

论文作者

Einav, Tal, Ma, Rong

论文摘要

在每个生物学领域的核心挑战是使用现有的测量值来预测未来实验的结果。在这项工作中，我们考虑了针对流感病毒变体的大量抗体抑制数据。由于这种VIRU的遗传多样性和发展性，一项研究中检查的变体通常与其他研究几乎没有重叠，因此很难辨别共同模式或统一数据集以进行进一步分析。为此，我们开发了一个计算框架，该框架可以预测抗体或血清如何抑制任何其他研究的任何变体。我们使用此框架大大扩展了使用血凝抑制的七个流感数据集，从而在200,000个现有测量值上验证了我们的方法，并预测了2,000,000个新值及其不确定性。有了这些新值，我们量化了人类和雪貂的七种疫苗接种和感染研究之间的可转移性，表明血清效力与宽度呈负相关，并提出了大流行准备的工具。这种数据驱动的方法不需要除每个病毒的名称和测量值之外的任何信息，甚至可以扩展具有5种病毒的数据集，从而使此方法广泛适用。从1968 - 2011年开始，使用血凝抑制抑制的未来流感研究可以直接利用我们的策划数据集来预测针对约80 H3N2流感病毒的新测得的抗体反应，而利用其他病毒或其他分析的免疫学研究只需要一个单个部分跨性数据集就可以扩展其工作。从本质上讲，这种方法可以在分析“您所看到的就是您所看到的”中的数据时，可以转变观点。

A central challenge in every field of biology is to use existing measurements to predict the outcomes of future experiments. In this work, we consider the wealth of antibody inhibition data against variants of the influenza virus. Due to this viru's genetic diversity and evolvability, the variants examined in one study will often have little-to-no overlap with other studies, making it difficult to discern common patterns or unify datasets for further analysis. To that end, we develop a computational framework that predicts how an antibody or serum would inhibit any variant from any other study. We use this framework to greatly expand seven influenza datasets utilizing hemagglutination inhibition, validating our method upon 200,000 existing measurements and predicting 2,000,000 new values along with their uncertainties. With these new values, we quantify the transferability between seven vaccination and infection studies in humans and ferrets, show that the serum potency is negatively correlated with breadth, and present a tool for pandemic preparedness. This data-driven approach does not require any information beyond each virus's name and measurements, and even datasets with as few as 5 viruses can be expanded, making this approach widely applicable. Future influenza studies using hemagglutination inhibition can directly utilize our curated datasets to predict newly measured antibody responses against ~80 H3N2 influenza viruses from 1968-2011, whereas immunological studies utilizing other viruses or a different assay only need a single partially-overlapping dataset to extend their work. In essence, this approach enables a shift in perspective when analyzing data from "what you see is what you get" into "what anyone sees is what everyone gets."

下载PDF全文

下载文献需遵守相关版权规定

论文标题