论文标题

通过离散的潜在变量对单步回逆变的多种化学反应进行建模

Modeling Diverse Chemical Reactions for Single-step Retrosynthesis via Discrete Latent Variables

论文作者

He, Huarui, Wang, Jie, Liu, Yunfei, Wu, Feng

论文摘要

单步反转合作是逆合合成计划的基石,这是计算机辅助药物发现的至关重要的任务。单步回合合成的目的是确定导致一个反应中靶标合成的可能反应物。通过将有机分子表示为规范字符串,现有的基于序列的倒角方法将乘积 - 反应性逆合合成视为序列到序列翻译问题。但是,由于确定性推断,他们中的大多数人都难以确定所需产物的各种化学反应,这与以下事实相矛盾:许多化合物都可以通过各种反应类型与不同的反应物组成。在这项工作中,我们旨在增加反应多样性,并使用离散的潜在变量产生各种反应物。我们提出了一种基于序列的新方法,即RetrodVcae,该方法将条件变分自动化码器纳入单步回逆转录中,并将离散的潜在变量与生成过程相关联。具体而言,RetroDVCAE使用gumbel-softmax分布来近似潜在反应的分类分布,并生成具有变异解码器的多组反应物。实验表明,Retrodvcae在基准数据集和自制数据集上的最先进基准均优于最先进的基线。定量和定性结果都表明,转化vcae可以在反应类型上对多模式分布进行建模,并产生各种反应物候选物。

Single-step retrosynthesis is the cornerstone of retrosynthesis planning, which is a crucial task for computer-aided drug discovery. The goal of single-step retrosynthesis is to identify the possible reactants that lead to the synthesis of the target product in one reaction. By representing organic molecules as canonical strings, existing sequence-based retrosynthetic methods treat the product-to-reactant retrosynthesis as a sequence-to-sequence translation problem. However, most of them struggle to identify diverse chemical reactions for a desired product due to the deterministic inference, which contradicts the fact that many compounds can be synthesized through various reaction types with different sets of reactants. In this work, we aim to increase reaction diversity and generate various reactants using discrete latent variables. We propose a novel sequence-based approach, namely RetroDVCAE, which incorporates conditional variational autoencoders into single-step retrosynthesis and associates discrete latent variables with the generation process. Specifically, RetroDVCAE uses the Gumbel-Softmax distribution to approximate the categorical distribution over potential reactions and generates multiple sets of reactants with the variational decoder. Experiments demonstrate that RetroDVCAE outperforms state-of-the-art baselines on both benchmark dataset and homemade dataset. Both quantitative and qualitative results show that RetroDVCAE can model the multi-modal distribution over reaction types and produce diverse reactant candidates.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源