论文标题

部分可观测时空混沌系统的无模型预测

Partial Product Aware Machine Learning on DNA-Encoded Libraries

论文作者

Binder, Polina, Lawler, Meghan, Grady, LaShadric, Carlson, Neil, Leelananda, Sumudu, Belyanskaya, Svetlana, Franklin, Joe, Tilmans, Nicolas, Palacci, Henri

论文摘要

DNA编码的文库(DEL)用于快速大规模筛选针对蛋白质靶标的小分子。这些组合文库通过化学和DNA连接的几个循环构建,产生了大量DNA标记的分子。 DEL数据上的训练机学习模型已被证明可以有效预测与原始DEL中的培训分子不同。机器学习化学性质预测方法取决于感兴趣的特性与单个化学结构有关的假设。在DNA编码的文库的背景下,这等同于假设每个化学反应完全产生所需的产物。但是,实际上,多步化学合成有时会产生部分分子。因此,DEL中的每个独特的DNA标签对应于一组可能的分子。在这里,我们利用反应产量数据来列举与给定DNA标签相对应的可能分子。本文表明,在此更丰富的数据集中培训自定义GNN可提高准确性和泛化性能。

DNA encoded libraries (DELs) are used for rapid large-scale screening of small molecules against a protein target. These combinatorial libraries are built through several cycles of chemistry and DNA ligation, producing large sets of DNA-tagged molecules. Training machine learning models on DEL data has been shown to be effective at predicting molecules of interest dissimilar from those in the original DEL. Machine learning chemical property prediction approaches rely on the assumption that the property of interest is linked to a single chemical structure. In the context of DNA-encoded libraries, this is equivalent to assuming that every chemical reaction fully yields the desired product. However, in practice, multi-step chemical synthesis sometimes generates partial molecules. Each unique DNA tag in a DEL therefore corresponds to a set of possible molecules. Here, we leverage reaction yield data to enumerate the set of possible molecules corresponding to a given DNA tag. This paper demonstrates that training a custom GNN on this richer dataset improves accuracy and generalization performance.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源