论文标题
3D蛋白结构的对比表示学习
Contrastive Representation Learning for 3D Protein Structures
论文作者
论文摘要
从3D蛋白质结构中学习对蛋白质建模和结构生物信息学引起了广泛的兴趣。不幸的是,可用结构的数量比计算机视觉和机器学习中常用的训练数据大小低的数量级。此外,当仅考虑带注释的蛋白质结构时,该数字将进一步降低,这使得对现有模型的训练难以训练,易于过度拟合。为了应对这一挑战,我们为3D蛋白质结构引入了一个新的表示学习框架。我们的框架使用无监督的对比学习来学习蛋白质结构的有意义表示,并利用蛋白质数据库中的蛋白质。我们表明,这些表示如何用于解决各种各样的任务,例如蛋白质功能预测,蛋白质折叠分类,结构相似性预测和蛋白质 - 配体结合亲和力预测。此外,我们展示了通过算法预先培训的微调网络如何导致任务绩效的显着改善,从而在许多任务中实现了新的最新结果。
Learning from 3D protein structures has gained wide interest in protein modeling and structural bioinformatics. Unfortunately, the number of available structures is orders of magnitude lower than the training data sizes commonly used in computer vision and machine learning. Moreover, this number is reduced even further, when only annotated protein structures can be considered, making the training of existing models difficult and prone to over-fitting. To address this challenge, we introduce a new representation learning framework for 3D protein structures. Our framework uses unsupervised contrastive learning to learn meaningful representations of protein structures, making use of proteins from the Protein Data Bank. We show, how these representations can be used to solve a large variety of tasks, such as protein function prediction, protein fold classification, structural similarity prediction, and protein-ligand binding affinity prediction. Moreover, we show how fine-tuned networks, pre-trained with our algorithm, lead to significantly improved task performance, achieving new state-of-the-art results in many tasks.