通过基于分子池的主动学习加速高通量虚拟筛选

论文标题

通过基于分子池的主动学习加速高通量虚拟筛选

Accelerating high-throughput virtual screening through molecular pool-based active learning

论文作者

Graff, David E., Shakhnovich, Eugene I., Coley, Connor W.

论文摘要

基于结构的虚拟筛查是早期药物发现中的重要工具，它得分靶蛋白与候选配体之间的相互作用。随着虚拟库继续增长（超过$ 10^8 $分子），在这些图书馆上进行详尽的虚拟筛选活动所需的资源也是如此。但是，贝叶斯优化技术可以帮助他们的探索：替代基于图书馆子集的预测亲和力的替代结构 - 托管关系模型可以应用于其余的图书馆成员，从而使最低有希望的化合物被排除在评估之外。在这项研究中，我们评估了应用于几种蛋白质配方对接数据集的各种替代模型体系结构，采集功能和获取批量的大小，即使使用贪婪的获取策略，也观察到计算成本的大幅降低；例如，在仅测试1亿个成员库的2.4％之后，可以找到87.9％的前50000个配体。这种模型指导的搜索减轻了筛选越来越大的虚拟库的计算成本的不断增长，并且可以通过超越对接的应用程序加速高通量虚拟筛选活动。

Structure-based virtual screening is an important tool in early stage drug discovery that scores the interactions between a target protein and candidate ligands. As virtual libraries continue to grow (in excess of $10^8$ molecules), so too do the resources necessary to conduct exhaustive virtual screening campaigns on these libraries. However, Bayesian optimization techniques can aid in their exploration: a surrogate structure-property relationship model trained on the predicted affinities of a subset of the library can be applied to the remaining library members, allowing the least promising compounds to be excluded from evaluation. In this study, we assess various surrogate model architectures, acquisition functions, and acquisition batch sizes as applied to several protein-ligand docking datasets and observe significant reductions in computational costs, even when using a greedy acquisition strategy; for example, 87.9% of the top-50000 ligands can be found after testing only 2.4% of a 100M member library. Such model-guided searches mitigate the increasing computational costs of screening increasingly large virtual libraries and can accelerate high-throughput virtual screening campaigns with applications beyond docking.

下载PDF全文

下载文献需遵守相关版权规定

论文标题