论文标题

基于人工神经网络和整数编程的有界分支高度推理的无环化合物的新方法

A Novel Method for Inference of Acyclic Chemical Compounds with Bounded Branch-height Based on Artificial Neural Networks and Integer Programming

论文作者

Azam, Naveed Ahmed, Zhu, Jianshen, Sun, Yanming, Shi, Yu, Shurbevski, Aleksandar, Zhao, Liang, Nagamochi, Hiroshi, Akutsu, Tatsuya

论文摘要

化学图的分析是计算分子生物学的主要研究主题,因为它的潜在应用在药物设计中。一种方法是逆数量结构活性/属性关系(QSAR/QSPR逆)分析,即从给定的化学活动/性能中推断化学结构。最近,已经提出了使用人工神经网络(ANN)和混合整数线性编程(MILP)的QSAR/QSPR框架。该方法由预测阶段和一个反向预测阶段组成。在第一阶段,引入了化学图$ g $的功能向量$ f(g)$,并用ANN构建了化学属性$π$的预测函数$ψ$。 In the second phase, given a target value $y^*$ of property $π$, a feature vector $x^*$ is inferred by solving an MILP formulated from the trained ANN so that $ψ(x^*)$ is close to $y^*$ and then a set of chemical structures $G^*$ such that $f(G^*)= x^*$ is enumerated by a graph search algorithm.该框架已应用于具有多达2个周期索引的化合物的情况。在$ n $ n $ non-non-Hydrogen原子的实例上进行的计算结果表明,可以推断出功能向量$ x^*$,最多可以$ n = 40 $,而图形$ g^*$最多可用于$ n = 15 $。当应用于化学无环图的情况时,$ g^*$的最大可计算直径约为8个。我们引入了图形结构的新特征“分支高度”,基于MILP公式和图形搜索算法,用于化学acyclic图。计算实验的结果使用诸如Octanol/Water分区系数,沸点和燃烧热的属性的属性表明,提出的方法可以推断化学无环形图$ g^*$,$ n = 50 $和直径30。

Analysis of chemical graphs is a major research topic in computational molecular biology due to its potential applications to drug design. One approach is inverse quantitative structure activity/property relationship (inverse QSAR/QSPR) analysis, which is to infer chemical structures from given chemical activities/properties. Recently, a framework has been proposed for inverse QSAR/QSPR using artificial neural networks (ANN) and mixed integer linear programming (MILP). This method consists of a prediction phase and an inverse prediction phase. In the first phase, a feature vector $f(G)$ of a chemical graph $G$ is introduced and a prediction function $ψ$ on a chemical property $π$ is constructed with an ANN. In the second phase, given a target value $y^*$ of property $π$, a feature vector $x^*$ is inferred by solving an MILP formulated from the trained ANN so that $ψ(x^*)$ is close to $y^*$ and then a set of chemical structures $G^*$ such that $f(G^*)= x^*$ is enumerated by a graph search algorithm. The framework has been applied to the case of chemical compounds with cycle index up to 2. The computational results conducted on instances with $n$ non-hydrogen atoms show that a feature vector $x^*$ can be inferred for up to around $n=40$ whereas graphs $G^*$ can be enumerated for up to $n=15$. When applied to the case of chemical acyclic graphs, the maximum computable diameter of $G^*$ was around up to around 8. We introduce a new characterization of graph structure, "branch-height," based on which an MILP formulation and a graph search algorithm are designed for chemical acyclic graphs. The results of computational experiments using properties such as octanol/water partition coefficient, boiling point and heat of combustion suggest that the proposed method can infer chemical acyclic graphs $G^*$ with $n=50$ and diameter 30.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源