skip to main content
10.1145/3583780.3614865acmconferencesArticle/Chapter ViewAbstractPublication PagescikmConference Proceedingsconference-collections
research-article

Enhanced Template-Free Reaction Prediction with Molecular Graphs and Sequence-based Data Augmentation

Published: 21 October 2023 Publication History

Abstract

Retrosynthesis and forward synthesis prediction are fundamental challenges in organic synthesis, computer-aided synthesis planning (CASP), and computer-aided drug design (CADD). The objective is to predict plausible reactants for a given target product and its corresponding inverse task. With the rapid development of deep learning, numerous approaches have been proposed to solve this problem from various perspectives. The methods based on molecular graphs benefit from their rich features embedded inside but face difficulties in applying existing sequence-based data augmentations due to the permutation invariance of graph structures. In this work, we propose SeqAGraph, a template-free approach that annotates input graphs with its root atom index to ensure compatibility with sequence-based data augmentation. The matrix product for global attention in graph encoders is implemented by indexing, elementwise product, and aggregation to fuse global attention with local message passing without graph padding. Experiments demonstrate that SeqAGraph fully benefits from molecular graphs and sequence-based data augmentation and achieves state-of-the-art accuracy in template-free approaches.

References

[1]
Jimmy Lei Ba, Jamie Ryan Kiros, and Geoffrey E Hinton. 2016. Layer Normalization. arXiv preprint arXiv:1607.06450 (2016).
[2]
David C Blakemore, Luis Castro, Ian Churcher, David C Rees, Andrew W Thomas, David M Wilson, and Anthony Wood. 2018. Organic Synthesis Provides Opportunities to Transform Drug Discovery. Nat. Chem., Vol. 10, 4 (2018), 383--394.
[3]
Shuan Chen and Yousung Jung. 2021. Deep Retrosynthetic Reaction Prediction Using Local Reactivity and Global Attention. JACS Au, Vol. 1, 10 (2021), 1612--1620.
[4]
Connor W Coley, William H Green, and Klavs F Jensen. 2018. Machine Learning in Computer-aided Synthesis Planning. Acc. Chem. Res., Vol. 51, 5 (2018), 1281--1289.
[5]
Connor W Coley, Luke Rogers, William H Green, and Klavs F Jensen. 2017. Computer-assisted Retrosynthesis Based on Molecular Similarity. ACS Cent. Sci., Vol. 3, 12 (2017), 1237--1245.
[6]
Hanjun Dai, Chengtao Li, Connor Coley, Bo Dai, and Le Song. 2019. Retrosynthesis Prediction with Conditional Graph Logic Network. Advances in Neural Information Processing Systems, Vol. 32 (2019).
[7]
Jiarui Feng, Yixin Chen, Fuhai Li, Anindya Sarkar, and Muhan Zhang. 2022. How Powerful are K-hop Message Passing Graph Neural Networks. In Advances in Neural Information Processing Systems, Vol. 35. 4776--4790.
[8]
Justin Gilmer, Samuel S Schoenholz, Patrick F Riley, Oriol Vinyals, and George E Dahl. 2017. Neural Message Passing for Quantum Chemistry. In International Conference on Machine Learning. PMLR, 1263--1272.
[9]
Hua-Rui He, Jie Wang, Yunfei Liu, and Feng Wu. 2022. Modeling Diverse Chemical Reactions for Single-Step Retrosynthesis via Discrete Latent Variables. In Proceedings of the 31st ACM International Conference on Information and Knowledge Management. Association for Computing Machinery, 717--726.
[10]
Ross Irwin, Spyridon Dimitriadis, Jiazhen He, and Esben Jannik Bjerrum. 2022. Chemformer: A Pre-trained Transformer for Computational Chemistry. Machine Learning: Science and Technology, Vol. 3, 1 (2022), 015022.
[11]
Wengong Jin, Connor Coley, Regina Barzilay, and Tommi Jaakkola. 2017. Predicting Organic Reaction Outcomes with Weisfeiler-Lehman Network. Advances in Neural Information Processing Systems, Vol. 30.
[12]
Pavel Karpov, Guillaume Godin, and Igor V Tetko. 2019. A Transformer Model for Retrosynthesis. In Artificial Neural Networks and Machine Learning -- ICANN 2019: Workshop and Special Sessions. Springer, 817--830.
[13]
Eunji Kim, Dongseon Lee, Youngchun Kwon, Min Sik Park, and Youn-Suk Choi. 2021. Valid, Plausible, and Diverse Retrosynthesis Using Tied Two-way Transformers with Latent Variables. J. Chem. Inf. Model., Vol. 61, 1 (2021), 123--133.
[14]
Greg Landrum et al. 2006. RDKit: Open-Source Cheminformatics Software. (2006).
[15]
Kangjie Lin, Youjun Xu, Jianfeng Pei, and Luhua Lai. 2020. Automatic Retrosynthetic Route Planning Using Template-free Models. Chem. Sci., Vol. 11, 12 (2020), 3355--3364.
[16]
Tsung-Yi Lin, Priya Goyal, Ross Girshick, Kaiming He, and Piotr Dollár. 2017. Focal Loss for Dense Object Detection. In Proceedings of the IEEE International Conference on Computer Vision. 2980--2988.
[17]
Zaiyun Lin, Shiqiu Yin, Lei Shi, Wenbiao Zhou, and Yingsheng John Zhang. 2023. G2GT: Retrosynthesis Prediction with Graph-to-Graph Attention Neural Network and Self-Training. J. Chem. Inf. Model., Vol. 63, 7 (2023), 1894--1905.
[18]
Bowen Liu, Bharath Ramsundar, Prasad Kawthekar, Jade Shi, Joseph Gomes, Quang Luu Nguyen, Stephen Ho, Jack Sloane, Paul Wender, and Vijay Pande. 2017. Retrosynthetic Reaction Prediction Using Neural Sequence-to-Sequence Models. ACS Cent. Sci., Vol. 3, 10 (2017), 1103--1113.
[19]
Ze Liu, Yutong Lin, Yue Cao, Han Hu, Yixuan Wei, Zheng Zhang, Stephen Lin, and Baining Guo. 2021. Swin Transformer: Hierarchical Vision Transformer using Shifted Windows. In Proceedings of the IEEE/CVF international conference on computer vision. 10012--10022.
[20]
Ilya Loshchilov and Frank Hutter. 2017. Decoupled Weight Decay Regularization. arXiv preprint arXiv:1711.05101 (2017).
[21]
Daniel Mark Lowe. 2012. Extraction of chemical structures and reactions from the literature. Ph.D. Dissertation. University of Cambridge.
[22]
David A Pensak and Elias James Corey. 1977. LHASA-Logic and Heuristics Applied to Synthetic Analysis. ACS Publications, Chapter 1, 1--32.
[23]
Mikołaj Sacha, Mikołaj Błaz, Piotr Byrski, Paweł Dabrowski-Tumanski, Mikołaj Chrominski, Rafał Loska, Paweł Włodarczyk-Pruszynski, and Stanisław Jastrzebski. 2021. Molecule Edit Graph Attention Network: Modeling Chemical Reactions as Sequences of Graph Edits. J. Chem. Inf. Model., Vol. 61, 7 (2021), 3273--3284.
[24]
Nadine Schneider, Nikolaus Stiefl, and Gregory A Landrum. 2016. What's What: The (nearly) Definitive Guide to Reaction Role Assignment. J. Chem. Inf. Model., Vol. 56, 12 (2016), 2336--2346.
[25]
Philippe Schwaller, Teodoro Laino, Théophile Gaudin, Peter Bolgar, Christopher A Hunter, Costas Bekas, and Alpha A Lee. 2019. Molecular Transformer: A Model for Uncertainty-calibrated Chemical Reaction Prediction. ACS Cent. Sci., Vol. 5, 9 (2019), 1572--1583.
[26]
Marwin HS Segler, Mike Preuss, and Mark P Waller. 2018. Planning Chemical Syntheses with Deep Neural Networks and Symbolic AI. Nature, Vol. 555, 7698 (2018), 604--610.
[27]
Marwin HS Segler and Mark P Waller. 2017. Neural-symbolic Machine Learning for Retrosynthesis and Reaction Prediction. Chem. -- Eur. J., Vol. 23, 25 (2017), 5966--5971.
[28]
Seung-Woo Seo, You Young Song, June Yong Yang, Seohui Bae, Hankook Lee, Jinwoo Shin, Sung Ju Hwang, and Eunho Yang. 2021. GTA: Graph Truncated Attention for Retrosynthesis. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 35. 531--539.
[29]
Chence Shi, Minkai Xu, Hongyu Guo, Ming Zhang, and Jian Tang. 2020. A Graph to Graphs Framework for Retrosynthesis Prediction. In International Conference on Machine Learning. PMLR, 8818--8827.
[30]
Vignesh Ram Somnath, Charlotte Bunne, Connor Coley, Andreas Krause, and Regina Barzilay. 2021. Learning Graph Models for Retrosynthesis Prediction. Advances in Neural Information Processing Systems, Vol. 34 (2021), 9405--9415.
[31]
Jianlin Su, Yu Lu, Shengfeng Pan, Ahmed Murtadha, Bo Wen, and Yunfeng Liu. 2021. Roformer: Enhanced Transformer with Rotary Position Embedding. arXiv preprint arXiv:2104.09864 (2021).
[32]
Ruoxi Sun, Hanjun Dai, Li Li, Steven Kearnes, and Bo Dai. 2021. Towards Understanding Retrosynthesis by Energy-based Models. In Advances in Neural Information Processing Systems, Vol. 34. 10186--10194.
[33]
Christian Szegedy, Vincent Vanhoucke, Sergey Ioffe, Jon Shlens, and Zbigniew Wojna. 2016. Rethinking the Inception Architecture for Computer Vision. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2818--2826.
[34]
Igor V Tetko, Pavel Karpov, Ruud Van Deursen, and Guillaume Godin. 2020. State-of-the-art Augmented NLP Transformer Models for Direct and Single-step Retrosynthesis. Nat. Commun., Vol. 11, 1 (2020), 1--11.
[35]
Zhengkai Tu and Connor W Coley. 2022. Permutation Invariant Graph-to-sequence Model for Template-free Retrosynthesis and Reaction Prediction. J. Chem. Inf. Model., Vol. 62, 15 (2022), 3503--3513.
[36]
Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Łukasz Kaiser, and Illia Polosukhin. 2017. Attention is ALL You Need. In Advances in Neural Information Processing Systems, Vol. 30.
[37]
Yue Wan, Chang-Yu Hsieh, Ben Liao, and Shengyu Zhang. 2022. Retroformer: Pushing the Limits of End-to-end Retrosynthesis Transformer. In International Conference on Machine Learning. PMLR, 22475--22490.
[38]
Hongyu Wang, Shuming Ma, Li Dong, Shaohan Huang, Dongdong Zhang, and Furu Wei. 2022. Deepnet: Scaling Transformers to 1,000 Layers. arXiv preprint arXiv:2203.00555 (2022).
[39]
Xiaorui Wang, Yuquan Li, Jiezhong Qiu, Guangyong Chen, Huanxiang Liu, Benben Liao, Chang-Yu Hsieh, and Xiaojun Yao. 2021. Retroprime: A Diverse, Plausible and Transformer-based Method for Single-step Retrosynthesis Predictions. Chem. Eng. J., Vol. 420 (2021), 129845.
[40]
David Weininger. 1988. SMILES, A Chemical Language and Information System. 1. Introduction to Methodology and Encoding Rules. J. Chem. Inf. Comput. Sci., Vol. 28, 1 (1988), 31--36.
[41]
Chaochao Yan, Qianggang Ding, Peilin Zhao, Shuangjia Zheng, Jinyu Yang, Yang Yu, and Junzhou Huang. 2020. Retroxpert: Decompose Retrosynthesis Prediction Like a Chemist. Advances in Neural Information Processing Systems, Vol. 33 (2020), 11248--11258.
[42]
Kevin Yang, Kyle Swanson, Wengong Jin, Connor Coley, Philipp Eiden, Hua Gao, Angel Guzman-Perez, Timothy Hopper, Brian Kelley, Miriam Mathea, et al. 2019. Analyzing Learned Molecular Representations for Property Prediction. J. Chem. Inf. Model., Vol. 59, 8 (2019), 3370--3388.
[43]
Chengxuan Ying, Tianle Cai, Shengjie Luo, Shuxin Zheng, Guolin Ke, Di He, Yanming Shen, and Tie-Yan Liu. 2021. Do Transformers Really Perform Badly for Graph Representation?. In Advances in Neural Information Processing Systems, Vol. 34. 28877--28888.
[44]
Biao Zhang and Rico Sennrich. 2019. Root Mean Square Layer Normalization. In Advances in Neural Information Processing Systems, Vol. 32.
[45]
Shuangjia Zheng, Jiahua Rao, Zhongyue Zhang, Jun Xu, and Yuedong Yang. 2019. Predicting Retrosynthetic Reactions Using Self-Corrected Transformer Neural Networks. J. Chem. Inf. Model., Vol. 60, 1 (2019), 47--55.

Cited By

View all
  • (2024)Application of Transformers in CheminformaticsJournal of Chemical Information and Modeling10.1021/acs.jcim.3c0207064:11(4392-4409)Online publication date: 30-May-2024

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
CIKM '23: Proceedings of the 32nd ACM International Conference on Information and Knowledge Management
October 2023
5508 pages
ISBN:9798400701245
DOI:10.1145/3583780
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 21 October 2023

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. data augmentation
  2. forward synthesis
  3. graph neural network
  4. retrosynthesis
  5. transformer

Qualifiers

  • Research-article

Funding Sources

Conference

CIKM '23
Sponsor:

Acceptance Rates

Overall Acceptance Rate 1,861 of 8,427 submissions, 22%

Upcoming Conference

CIKM '25

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)95
  • Downloads (Last 6 weeks)3
Reflects downloads up to 01 Mar 2025

Other Metrics

Citations

Cited By

View all
  • (2024)Application of Transformers in CheminformaticsJournal of Chemical Information and Modeling10.1021/acs.jcim.3c0207064:11(4392-4409)Online publication date: 30-May-2024

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media