BiG2S: A dual task graph-to-sequence model for the end-to-end template-free reaction prediction

Hu, Haozhe; Jiang, Yongquan; Yang, Yan; Chen, Jim X.

doi:10.1007/s10489-023-05048-8

BiG2S: A dual task graph-to-sequence model for the end-to-end template-free reaction prediction

Published: 01 November 2023

Volume 53, pages 29620–29637, (2023)
Cite this article

Applied Intelligence Aims and scope Submit manuscript

Haozhe Hu¹,
Yongquan Jiang^1,2,
Yan Yang^1,2 &
…
Jim X. Chen³

296 Accesses
Explore all metrics

Abstract

Retrosynthesis and reaction outcome prediction are fundamental problems in organic chemistry and computer-aided synthesis planning (CASP), which are also crucial parts of computer-aided drug design. In recent years, deep learning has spawned a branch of methods which use machine translation frameworks with SMILES data representation to solve these problems. With the successive introduction of additional inverted reaction data as well as the molecular graph representation, the accuracy and validity of machine-transaction-based approaches have been further improved. In this work, we propose a bidirectional graph-to-sequence model (BiG2S) that combines the benefits of inverted reaction training and graph representation. The proposed approach has the ability to provide high-quality retrosynthesis and forward synthesis prediction simultaneously on various datasets, which achieves \(5.5\%\) top-1 accuracy with only \(0.1\%\) invalid results on USPTO-50k retrosynthesis task, and maintain \(85.0\%\) top-1 accuracy for outcome prediction with the same model.

Graphical abstract

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing

Article Open access 25 May 2023

Retrosynthesis prediction with an interpretable deep-learning framework based on molecular assembly tasks

Article Open access 03 October 2023

Multi-objective de novo drug design with conditional graph generative model

Article Open access 24 July 2018

References

Blakemore DC, Castro L, Churcher I et al (2018) Organic synthesis provides opportunities to transform drug discovery. Nat Chem 10(4):383–394. https://doi.org/10.1038/s41557-018-0021-z
Article Google Scholar
Coley CW, Green WH, Jensen KF (2018) Machine learning in computer-aided synthesis planning. Acc Chem Res 51(5):1281–1289. https://doi.org/10.1021/acs.accounts.8b00087
Article Google Scholar
Segler MHS, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chem - Eur J 23(25):5966–5971. https://doi.org/10.1002/chem.201605499
Article Google Scholar
Segler MHS, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610. https://doi.org/10.1038/nature25978
Article Google Scholar
Coley CW, Rogers L, Green WH et al (2017) Computer-assisted retrosynthesis based on molecular similarity. ACS Cent Sci 3(12):1237–1245. https://doi.org/10.1021/acscentsci.7b00355
Article Google Scholar
Dai H, Li C, Coley C et al (2019) Retrosynthesis prediction with conditional graph logic network. Adv Neural Inf Process Syst 32
Chen S, Jung Y (2021) Deep retrosynthetic reaction prediction using local reactivity and global attention. JACS Au 1(10):1612–1620. https://doi.org/10.1021/jacsau.1c00246
Article Google Scholar
Liu B, Ramsundar B, Kawthekar P et al (2017) Retrosynthetic reaction prediction using neural sequence-to-sequence models. ACS Cent Sci 3(10):1103–1113. https://doi.org/10.1021/acscentsci.7b00303
Article Google Scholar
Karpov P, Godin G, Tetko IV (2019) A transformer model for retrosynthesis. In: International Conference on Artificial Neural Networks, Springer, pp 817–830
Zheng S, Rao J, Zhang Z et al (2019) Predicting retrosynthetic reactions using self-corrected transformer neural networks. J Chem Inf Model 60(1):47–55. https://doi.org/10.1021/acs.jcim.9b00949
Article Google Scholar
Schwaller P, Laino T, Gaudin T et al (2019) Molecular transformer: A model for uncertainty-calibrated chemical reaction prediction. ACS Cent Sci 5(9):1572–1583. https://doi.org/10.1021/acscentsci.9b00576
Article Google Scholar
Lin K, Xu Y, Pei J et al (2020) Automatic retrosynthetic route planning using template-free models. Chem Sci 11(12):3355–3364. https://doi.org/10.1039/c9sc03666k
Article Google Scholar
Tetko IV, Karpov P, Van Deursen R et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nat Commun 11(1):1–11. https://doi.org/10.1038/s41467-020-19266-y
Article Google Scholar
Kim E, Lee D, Kwon Y et al (2021) Valid, plausible, and diverse retrosynthesis using tied two-way transformers with latent variables. J Chem Inf Model 61(1):123–133. https://doi.org/10.1021/acs.jcim.0c01074
Article Google Scholar
Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. J Chem Inf Model 62(15):3503–3513. https://doi.org/10.1021/acs.jcim.2c00321
Article Google Scholar
Wan Y, Hsieh CY, Liao B et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, PMLR, pp 22475–22490
Vaswani A, Shazeer N, Parmar N et al (2017) Attention is all you need. Advances in Neural Information Processing Systems 30
Shi C, Xu M, Guo H et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International Conference on Machine Learning, PMLR, pp 8818–8827
Yan C, Ding Q, Zhao P et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. Adv Neural Inf Process Syst 33:11248–11258
Google Scholar
Somnath VR, Bunne C, Coley C et al (2021) Learning graph models for retrosynthesis prediction. Adv Neural Inf Process Syst 34:9405–9415
Google Scholar
Sacha M, Błaz M, Byrski P et al (2021) Molecule edit graph attention network: Modeling chemical reactions as sequences of graph edits. J Chem Inf Model 61(7):3273–3284. https://doi.org/10.1021/acs.jcim.1c00537
Article Google Scholar
Gilmer J, Schoenholz SS, Riley PF et al (2017) Neural message passing for quantum chemistry. In: International Conference on Machine Learning, PMLR, pp 1263–1272
Yang K, Swanson K, Jin W et al (2019) Analyzing learned molecular representations for property prediction. J Chem Inf Model 59(8):3370–3388. https://doi.org/10.1021/acs.jcim.9b00237
Article Google Scholar
Song Y, Zheng S, Niu Z et al (2020) Communicative representation learning on attributed molecular graphs. In: International Joint Conference on Artificial Intelligence, pp 2831–2838, https://doi.org/10.24963/ijcai.2020/392
Hendrycks D, Gimpel K (2016) Gaussian error linear units (gelus). https://doi.org/10.48550/arxiv.1606.08415
Cho K, Van Merriënboer B, Bahdanau D et al (2014) Learning phrase representations using rnn encoder-decoder for statistical machine translation. In: Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp 1724–1734
Ying C, Cai T, Luo S et al (2021) Do transformers really perform badly for graph representation? Adv Neural Inf Process Syst 34:28877–28888
Google Scholar
Dauphin YN, Fan A, Auli M et al (2017) Language modeling with gated convolutional networks. In: International Conference on Machine Learning, PMLR, pp 933–941
Zhang B, Sennrich R (2019) Root mean square layer normalization. Advances in Neural Information Processing Systems 32
Wang H, Ma S, Dong L et al (2022) Deepnet: Scaling transformers to 1,000 layers. https://doi.org/10.48550/arxiv.2203.00555
Su J, Lu Y, Pan S et al (2021) Roformer: Enhanced transformer with rotary position embedding. https://doi.org/10.48550/arxiv.2104.09864
Lin TY, Goyal P, Girshick R et al (2017) Focal loss for dense object detection. In: Proceedings of the IEEE International Conference on Computer Vision, pp 2980–2988, https://doi.org/10.1109/iccv.2017.324
Szegedy C, Vanhoucke V, Ioffe S et al (2016) Rethinking the inception architecture for computer vision. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 2818–2826
Müller R, Kornblith S, Hinton GE (2019) When does label smoothing help? Advances in Neural Information Processing Systems 32
Klein G, Kim Y, Deng Y et al (2017) Opennmt: Open-source toolkit for neural machine translation. In: Proceedings of ACL 2017, System Demonstrations, pp 67–72
Wolf T, Debut L, Sanh V et al (2020) Transformers: State-of-the-art natural language processing. In: Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: System Demonstrations, pp 38–45, https://doi.org/10.18653/v1/2020.emnlp-demos.6
Lowe DM (2012) Extraction of chemical structures and reactions from the literature. PhD thesis, University of Cambridge, 10.17863/CAM.16293
Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. J Chem Inf Model 56(12):2336–2346. https://doi.org/10.1021/acs.jcim.6b00564
Article Google Scholar
Landrum G (2022) Rdkit: Open-source cheminformatics software. https://rdkit.org/
Jin W, Coley CW, Barzilay R et al (2017) Predicting organic reaction outcomes with weisfeiler-lehman network. Advances in Neural Information Processing Systems 30
Loshchilov I, Hutter F (2017) Decoupled weight decay regularization. https://doi.org/10.48550/arxiv.1711.05101
Sun R, Dai H, Li L et al (2021) Towards understanding retrosynthesis by energy-based models. Adv Neural Inf Process Syst 34:10186–10194
Google Scholar
Irwin R, Dimitriadis S, He J et al (2022) Chemformer: A pre-trained transformer for computational chemistry. Machine Learning: Science and Technology 3(1):015022. https://doi.org/10.1088/2632-2153/ac3ffb
Article Google Scholar
Wang X, Li Y, Qiu J et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chem Eng J 420:129845. https://doi.org/10.1016/j.cej.2021.129845
Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nat Commun 14(1):3009. https://doi.org/10.1038/s41467-023-38851-5
Seo SW, Song YY, Yang JY et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539, 10.1609/aaai.v35i1.16131
ASKCOS (2022) Askcos: Software tools for organic synthesis. https://askcos.mit.edu/
Coley CW, Rogers L, Green WH et al (2018) Scscore: Synthetic complexity learned from a reaction corpus. J Chem Inf Model 58(2):252–261. https://doi.org/10.1021/acs.jcim.7b00622
Article Google Scholar

Download references

Acknowledgements

This work was supported by the National Natural Science Foundation of China (No.61976247) and the Fundamental Research Funds for the Central Universities (No.2682023ZTPY057). The authors thank all members of the research team for their suggestions and contributions to the research ideas and directions of this work. We thank AutoDL cloud computing service platform for providing GPUs rental service. Finally, the authors thank Jianlin Su for his analysis and explanation of the mathematical theory related to Transformer and various language models on the blog.

Author information

Authors and Affiliations

School of Computing and Artificial Intelligence, Southwest Jiaotong University, 611756, Chengdu, Sichuan, China
Haozhe Hu, Yongquan Jiang & Yan Yang
Engineering Research Center of Sustainable Urban Intelligent Transportation, Ministry of Education, Beijing, China
Yongquan Jiang & Yan Yang
Department of Computer Science, George Mason University, 22030, Fairfax, Virginia, USA
Jim X. Chen

Authors

Haozhe Hu
View author publications
You can also search for this author in PubMed Google Scholar
Yongquan Jiang
View author publications
You can also search for this author in PubMed Google Scholar
Yan Yang
View author publications
You can also search for this author in PubMed Google Scholar
Jim X. Chen
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Yongquan Jiang.

Ethics declarations

Conflict of Interests

The authors declare that they have no conflicts of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendices

Appendix A: Feature and parameter settings

Table 10 summarizes the atom and bond features used in BiG2S, the majority of them are adapted from Graph2SMILES [15]. To support the dual-task capability of the model, additional task labels and optional reaction type information have been added to the atom features.

Table 10 Atom and bond features used in model

Full size table

Table 11 Initialization for BiG2S

Full size table

BiG2S introduces a special initialization method with the normalization of each layer in Transformer which are based on DeepNet [30]. Since the output of GLU [28] is significantly smaller in variance than the vanilla FFN [17] when the input data has the same distribution (e.g., standard normal distribution), BiG2S performs different initializations for each of the two types of structures above. The initialization and normalization methods are shown in Tables 11 and 12, where \(\textbf{W}_{query}\), \(\textbf{W}_{key}\), \(\textbf{W}_{value}\), and \(\textbf{W}_{out}\) denote the learnable weights of query, key, value, and the final output in attention layer; \(\textbf{W}_{FFN}\) and \(\textbf{W}_{GLU}\) denote the learnable weights of all linear layers in these two structures; N and M correspond to the number of layers of encoder and decoder, respectively.

Table 12 Normalization for BiG2S

Full size table

Additionally, the hyperparameter settings for each dataset are shown in Table 13, where the primary batch loading type for BiG2S is the number of chemical reactions, with additional reactants and product token counts to constrain the batch loading due to the extremely long reactions in USPTO-full.

Table 13 Hyperparameters setting for each dataset

Full size table

Appendix B: Statistical results during training and visualization results in dual-task inference

The statistical results for the training accuracy of each molecular structure and the performance variance on the validation set throughout the training are shown in Figs. 4 and 5.

We additionally perform reaction outcome prediction for the products and retrosynthesis for the reactants, as well as the results evaluation and SCScore [48] obtained from ASKCOS [47], the visualization results are shown in Figs. 6 and 7. Notice that when evaluating the retrosynthesis results of reactants, the result will be marked as “Highly ranked in ASKCOS” if its corresponding reactants (as the prediction results in ASKCOS forward synthesis prediction) rank in the top-5 when the results are input into ASKCOS for forward synthesis prediction, which is different from the evaluation of retrosynthesis results of the product that it requires the product to be the top-1 result in the forward synthesis results from ASKCOS (Fig. 2). Since the training of BiG2S only contains data related to the retrosynthesis of products and the reaction outcome prediction for reactants, the quality of model outputs is significantly reduced when these two additional tasks are performed, especially when multiple molecules of reactants are analyzed in retrosynthesis task at the same time, the model can therefore hardly obtain reasonable prediction results.

From the retrosynthesis results of the reactants, it is noticeable that the model attempts to generate the corresponding prediction results for each input molecule (such as the decomposition of 1-bromobut-2-yne in Fig. 7), but it is limited by the fact that this task is not involved in the training, which results in a considerable part of the molecules being the same as inputs or directly taking the product of the original reaction as one of the results. The retrosynthesis results are more reasonable when each molecule in reactants is input separately, some of which can further achieve a high ranking in the evaluation of ASKCOS. When performing forward synthesis prediction on each product, the results depend mainly on the bias of the model for various reaction types, reaction centers, and functional groups during training due to the lack of constraints and guidelines from other reactants.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Hu, H., Jiang, Y., Yang, Y. et al. BiG2S: A dual task graph-to-sequence model for the end-to-end template-free reaction prediction. Appl Intell 53, 29620–29637 (2023). https://doi.org/10.1007/s10489-023-05048-8

Download citation

Accepted: 25 September 2023
Published: 01 November 2023
Issue Date: December 2023
DOI: https://doi.org/10.1007/s10489-023-05048-8

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions