论文标题
神经网络指导的遗传程序员,用于发现管理方程式
Neural-Network-Directed Genetic Programmer for Discovery of Governing Equations
论文作者
论文摘要
我们开发了一个符号回归框架,用于从观察到的数据中提取数学表达式。进化方法FAIGP旨在利用已编码为语法的函数代数的属性,从而提供了通用近似的理论保证,并且可以最大程度地减少膨胀。在此框架中,语法操作员的选择可以通过物理理论或对称的考虑来告知。由于目前没有可以得出“自然常数”的理论,因此从进化过程中提取这些系数的实证研究具有方法论利益。我们量化了不同类型的正规化器的影响,包括根据转录组的研究和复杂性度量对框架性能进行的多样性度量。我们的实现利用神经网络和遗传程序员,会生成非平凡的象征性等效表达式(“ Ramanujan表达”)或具有潜在有趣的数值应用程序的近似值。为了说明该框架,提出了一个由转录因子调节基因调控的配体 - 受体结合动力学的模型,以及从OMICS数据中介绍了Cistrome的调节范围的模型。这项研究对数据驱动方法的发展具有重要意义,以发现从新的传感系统和高通量筛选技术得出的实验数据中的控制方程。
We develop a symbolic regression framework for extracting the governing mathematical expressions from observed data. The evolutionary approach, faiGP, is designed to leverage the properties of a function algebra that have been encoded into a grammar, providing a theoretical guarantee of universal approximation and a way to minimize bloat. In this framework, the choice of operators of the grammar may be informed by a physical theory or symmetry considerations. Since there is currently no theory that can derive the 'constants of nature', an empirical investigation on extracting these coefficients from an evolutionary process is of methodological interest. We quantify the impact of different types of regularizers, including a diversity metric adapted from studies of the transcriptome and a complexity measure, on the performance of the framework. Our implementation, which leverages neural networks and a genetic programmer, generates non-trivial symbolically equivalent expressions ("Ramanujan expressions") or approximations with potentially interesting numerical applications. To illustrate the framework, a model of ligand-receptor binding kinetics, including an account of gene regulation by transcription factors, and a model of the regulatory range of the cistrome from omics data are presented. This study has important implications on the development of data-driven methodologies for the discovery of governing equations in experimental data derived from new sensing systems and high-throughput screening technologies.