论文标题
可解释的结构化学习,具有稀疏的门控序列编码器,用于蛋白质 - 蛋白质相互作用预测
Interpretable Structured Learning with Sparse Gated Sequence Encoder for Protein-Protein Interaction Prediction
论文作者
论文摘要
通过从氨基酸序列中学习信息代表性来预测蛋白质蛋白相互作用(PPI)是生物学中充满挑战而重要的问题。尽管已经提出了暹罗结构中的各种深度学习模型来对序列进行模拟PPI,但由于成对编码过程,这些方法对于大量PPI而言在计算上很昂贵。此外,由于从蛋白质序列到其序列表示的非直觉映射,这些方法很难解释。为了应对这些挑战,我们提出了一个新颖的深层框架,以模拟和仅从序列中预测PPI。我们的模型结合了双向门控复发单元,通过利用序列的上下文化和顺序信息来学习序列表示。我们进一步采用稀疏的正则化来模拟氨基酸之间的远距离依赖性,并选择重要的氨基酸(蛋白质基序),从而增强了可解释性。此外,编码过程的新设计使我们的模型在计算上有效且可扩展到越来越多的相互作用。最新交互数据集的实验结果表明,与其他最先进的方法相比,我们的模型可以达到卓越的性能。基于文献的案例研究说明了我们模型提供生物学见解以解释预测的能力。
Predicting protein-protein interactions (PPIs) by learning informative representations from amino acid sequences is a challenging yet important problem in biology. Although various deep learning models in Siamese architecture have been proposed to model PPIs from sequences, these methods are computationally expensive for a large number of PPIs due to the pairwise encoding process. Furthermore, these methods are difficult to interpret because of non-intuitive mappings from protein sequences to their sequence representation. To address these challenges, we present a novel deep framework to model and predict PPIs from sequence alone. Our model incorporates a bidirectional gated recurrent unit to learn sequence representations by leveraging contextualized and sequential information from sequences. We further employ a sparse regularization to model long-range dependencies between amino acids and to select important amino acids (protein motifs), thus enhancing interpretability. Besides, the novel design of the encoding process makes our model computationally efficient and scalable to an increasing number of interactions. Experimental results on up-to-date interaction datasets demonstrate that our model achieves superior performance compared to other state-of-the-art methods. Literature-based case studies illustrate the ability of our model to provide biological insights to interpret the predictions.