一项关于图神经网络用于虚拟筛查的预测可靠性的综合研究

论文标题

一项关于图神经网络用于虚拟筛查的预测可靠性的综合研究

A comprehensive study on the prediction reliability of graph neural networks for virtual screening

论文作者

Yang, Soojung, Lee, Kyung Hoon, Ryu, Seongok

论文摘要

基于深神经网络的预测模型越来越多地引起人们对快速，准确的虚拟筛查系统的关注。对于虚拟筛查中的决策制定，研究人员发现将分类系统的输出解释为概率非常有用，因为这种解释使他们能够过滤掉更多理想的化合物。但是，对于存在过度参数化问题或不当正规化的模型，概率的解释是不正确的，从而导致不可靠的预测和决策。在这方面，我们关注神经预测模型对分子特性的可靠性，尤其是当模型经过稀疏的数据点和分布不平衡的训练时。这项工作旨在为培训可靠模型提供指南，因此我们提供了有关以下火车原则的方法论细节和消融研究。我们研究了模型架构，正则化方法和损失功能对分类结果的预测性能和可靠性的影响。此外，我们评估模型在虚拟筛选方案上的预测可靠性。我们的结果表明，正确选择正则化和推理方法对于获得高成功率显然很重要，尤其是在数据不平衡情况下。所有实验均在单个统一的模型实施下进行，以减轻模型训练中的外部随机性，并可以对结果进行精确比较。

Prediction models based on deep neural networks are increasingly gaining attention for fast and accurate virtual screening systems. For decision makings in virtual screening, researchers find it useful to interpret an output of classification system as probability, since such interpretation allows them to filter out more desirable compounds. However, probabilistic interpretation cannot be correct for models that hold over-parameterization problems or inappropriate regularizations, leading to unreliable prediction and decision making. In this regard, we concern the reliability of neural prediction models on molecular properties, especially when models are trained with sparse data points and imbalanced distributions. This work aims to propose guidelines for training reliable models, we thus provide methodological details and ablation studies on the following train principles. We investigate the effects of model architectures, regularization methods, and loss functions on the prediction performance and reliability of classification results. Moreover, we evaluate prediction reliability of models on virtual screening scenario. Our result highlights that correct choice of regularization and inference methods is evidently important to achieve high success rate, especially in data imbalanced situation. All experiments were performed under a single unified model implementation to alleviate external randomness in model training and to enable precise comparison of results.

下载PDF全文

下载文献需遵守相关版权规定

论文标题