论文标题

自拍照和分子字符串表示的未来

SELFIES and the future of molecular string representations

论文作者

Krenn, Mario, Ai, Qianxiang, Barthel, Senja, Carson, Nessa, Frei, Angelo, Frey, Nathan C., Friederich, Pascal, Gaudin, Théophile, Gayle, Alberto Alexander, Jablonka, Kevin Maik, Lameiro, Rafael F., Lemm, Dominik, Lo, Alston, Moosavi, Seyed Mohamad, Nápoles-Duarte, José Manuel, Nigam, AkshatKumar, Pollice, Robert, Rajan, Kohulan, Schatzschneider, Ulrich, Schwaller, Philippe, Skreta, Marta, Smit, Berend, Strieth-Kalthoff, Felix, Sun, Chong, Tom, Gary, von Rudorff, Guido Falk, Wang, Andrew, White, Andrew, Young, Adamo, Yu, Rose, Aspuru-Guzik, Alán

论文摘要

人工智能(AI)和机器学习(ML)在广泛应用化学和材料科学方面的广泛应用方面扩展。示例包括性质的预测,新反应途径的发现或新分子的设计。该机器需要为每种任务中的每一种都用化学语言读写。字符串是代表分子图的常见工具,自1980年代后期以来,最流行的分子弦表示,微笑具有动力的化学形式。但是,在化学中的AI和ML的背景下,微笑有几个缺点 - 最相关的是,大多数符号组合导致无效的结果,没有有效的化学解释。为了克服这个问题,在2020年引入了一种新的分子语言,可保证100 \%的鲁棒性:自拍照(自我引用嵌入了字符串)。自那以后,自拍照简化并启用了化学方面的众多新应用。在本手稿中,我们展望未来,讨论分子弦表示,以及它们各自的机会和挑战。我们提出了16个具体的未来项目,以实现强大的分子表示。这些涉及向新的化学领域的扩展,AI界面上的令人兴奋的问题以及强大的语言以及对人类和机器的解释性。我们希望这些建议将激发几项后续工作,从而利用了分子弦乐表示在化学和材料科学中的未来的全部潜力。

Artificial intelligence (AI) and machine learning (ML) are expanding in popularity for broad applications to challenging tasks in chemistry and materials science. Examples include the prediction of properties, the discovery of new reaction pathways, or the design of new molecules. The machine needs to read and write fluently in a chemical language for each of these tasks. Strings are a common tool to represent molecular graphs, and the most popular molecular string representation, SMILES, has powered cheminformatics since the late 1980s. However, in the context of AI and ML in chemistry, SMILES has several shortcomings -- most pertinently, most combinations of symbols lead to invalid results with no valid chemical interpretation. To overcome this issue, a new language for molecules was introduced in 2020 that guarantees 100\% robustness: SELFIES (SELF-referencIng Embedded Strings). SELFIES has since simplified and enabled numerous new applications in chemistry. In this manuscript, we look to the future and discuss molecular string representations, along with their respective opportunities and challenges. We propose 16 concrete Future Projects for robust molecular representations. These involve the extension toward new chemical domains, exciting questions at the interface of AI and robust languages and interpretability for both humans and machines. We hope that these proposals will inspire several follow-up works exploiting the full potential of molecular string representations for the future of AI in chemistry and materials science.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源