Imdrug：在AI辅助药物发现中的深度不平衡学习的基准

论文标题

Imdrug：在AI辅助药物发现中的深度不平衡学习的基准

ImDrug: A Benchmark for Deep Imbalanced Learning in AI-aided Drug Discovery

论文作者

Li, Lanqing, Zeng, Liang, Gao, Ziqi, Yuan, Shen, Bian, Yatao, Wu, Bingzhe, Zhang, Hengtong, Yu, Yang, Lu, Chan, Zhou, Zhipeng, Xu, Hongteng, Li, Jia, Zhao, Peilin, Heng, Pheng-Ann

论文摘要

在过去的十年中，为AI AID毒品发现（AIDD）的计算方法和数据集策划进行了繁荣的发展。但是，现实世界中的药物数据集经常表现出高度不平衡的分布，这被当前文献所忽略了，但可能会严重损害机器学习应用的公平性和概括。在这一观察方面的启发下，我们介绍了Imdrug，这是一个全面的基准测试，其开源python库由4个不平衡设置，11个AI-Ready数据集，54个学习任务和16个针对不平衡学习的基线算法组成。它为问题和解决方案提供了可访问且可定制的测试床，该问题涵盖了广泛的药物发现管道，例如分子建模，药物靶标相互作用和逆合合成。我们通过新的评估指标进行了广泛的实证研究，以证明现有算法在数据不平衡情况下无法解决药物和药物挑战。我们认为，Imdrug在AIDD和深度不平衡学习的交集中就现实世界中的挑战开辟了未来研究和发展的途径。

The last decade has witnessed a prosperous development of computational methods and dataset curation for AI-aided drug discovery (AIDD). However, real-world pharmaceutical datasets often exhibit highly imbalanced distribution, which is overlooked by the current literature but may severely compromise the fairness and generalization of machine learning applications. Motivated by this observation, we introduce ImDrug, a comprehensive benchmark with an open-source Python library which consists of 4 imbalance settings, 11 AI-ready datasets, 54 learning tasks and 16 baseline algorithms tailored for imbalanced learning. It provides an accessible and customizable testbed for problems and solutions spanning a broad spectrum of the drug discovery pipeline such as molecular modeling, drug-target interaction and retrosynthesis. We conduct extensive empirical studies with novel evaluation metrics, to demonstrate that the existing algorithms fall short of solving medicinal and pharmaceutical challenges in the data imbalance scenario. We believe that ImDrug opens up avenues for future research and development, on real-world challenges at the intersection of AIDD and deep imbalanced learning.

下载PDF全文

下载文献需遵守相关版权规定

论文标题