Skip to main content
Log in

BiG2S: A dual task graph-to-sequence model for the end-to-end template-free reaction prediction

  • Published:
Applied Intelligence Aims and scope Submit manuscript

Abstract

Retrosynthesis and reaction outcome prediction are fundamental problems in organic chemistry and computer-aided synthesis planning (CASP), which are also crucial parts of computer-aided drug design. In recent years, deep learning has spawned a branch of methods which use machine translation frameworks with SMILES data representation to solve these problems. With the successive introduction of additional inverted reaction data as well as the molecular graph representation, the accuracy and validity of machine-transaction-based approaches have been further improved. In this work, we propose a bidirectional graph-to-sequence model (BiG2S) that combines the benefits of inverted reaction training and graph representation. The proposed approach has the ability to provide high-quality retrosynthesis and forward synthesis prediction simultaneously on various datasets, which achieves \(5.5\%\) top-1 accuracy with only \(0.1\%\) invalid results on USPTO-50k retrosynthesis task, and maintain \(85.0\%\) top-1 accuracy for outcome prediction with the same model.

Graphical abstract

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3

Similar content being viewed by others

References

  1. Blakemore DC, Castro L, Churcher I et al (2018) Organic synthesis provides opportunities to transform drug discovery. Nat Chem 10(4):383–394. https://doi.org/10.1038/s41557-018-0021-z

    Article  Google Scholar 

  2. Coley CW, Green WH, Jensen KF (2018) Machine learning in computer-aided synthesis planning. Acc Chem Res 51(5):1281–1289. https://doi.org/10.1021/acs.accounts.8b00087

    Article  Google Scholar 

  3. Segler MHS, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chem - Eur J 23(25):5966–5971. https://doi.org/10.1002/chem.201605499

    Article  Google Scholar 

  4. Segler MHS, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610. https://doi.org/10.1038/nature25978

    Article  Google Scholar 

  5. Coley CW, Rogers L, Green WH et al (2017) Computer-assisted retrosynthesis based on molecular similarity. ACS Cent Sci 3(12):1237–1245. https://doi.org/10.1021/acscentsci.7b00355

    Article  Google Scholar 

  6. Dai H, Li C, Coley C et al (2019) Retrosynthesis prediction with conditional graph logic network. Adv Neural Inf Process Syst 32

  7. Chen S, Jung Y (2021) Deep retrosynthetic reaction prediction using local reactivity and global attention. JACS Au 1(10):1612–1620. https://doi.org/10.1021/jacsau.1c00246

    Article  Google Scholar 

  8. Liu B, Ramsundar B, Kawthekar P et al (2017) Retrosynthetic reaction prediction using neural sequence-to-sequence models. ACS Cent Sci 3(10):1103–1113. https://doi.org/10.1021/acscentsci.7b00303

    Article  Google Scholar 

  9. Karpov P, Godin G, Tetko IV (2019) A transformer model for retrosynthesis. In: International Conference on Artificial Neural Networks, Springer, pp 817–830

  10. Zheng S, Rao J, Zhang Z et al (2019) Predicting retrosynthetic reactions using self-corrected transformer neural networks. J Chem Inf Model 60(1):47–55. https://doi.org/10.1021/acs.jcim.9b00949

    Article  Google Scholar 

  11. Schwaller P, Laino T, Gaudin T et al (2019) Molecular transformer: A model for uncertainty-calibrated chemical reaction prediction. ACS Cent Sci 5(9):1572–1583. https://doi.org/10.1021/acscentsci.9b00576

    Article  Google Scholar 

  12. Lin K, Xu Y, Pei J et al (2020) Automatic retrosynthetic route planning using template-free models. Chem Sci 11(12):3355–3364. https://doi.org/10.1039/c9sc03666k

    Article  Google Scholar 

  13. Tetko IV, Karpov P, Van Deursen R et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nat Commun 11(1):1–11. https://doi.org/10.1038/s41467-020-19266-y

    Article  Google Scholar 

  14. Kim E, Lee D, Kwon Y et al (2021) Valid, plausible, and diverse retrosynthesis using tied two-way transformers with latent variables. J Chem Inf Model 61(1):123–133. https://doi.org/10.1021/acs.jcim.0c01074

    Article  Google Scholar 

  15. Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. J Chem Inf Model 62(15):3503–3513. https://doi.org/10.1021/acs.jcim.2c00321

    Article  Google Scholar 

  16. Wan Y, Hsieh CY, Liao B et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, PMLR, pp 22475–22490

  17. Vaswani A, Shazeer N, Parmar N et al (2017) Attention is all you need. Advances in Neural Information Processing Systems 30

  18. Shi C, Xu M, Guo H et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International Conference on Machine Learning, PMLR, pp 8818–8827

  19. Yan C, Ding Q, Zhao P et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. Adv Neural Inf Process Syst 33:11248–11258

    Google Scholar 

  20. Somnath VR, Bunne C, Coley C et al (2021) Learning graph models for retrosynthesis prediction. Adv Neural Inf Process Syst 34:9405–9415

    Google Scholar 

  21. Sacha M, Błaz M, Byrski P et al (2021) Molecule edit graph attention network: Modeling chemical reactions as sequences of graph edits. J Chem Inf Model 61(7):3273–3284. https://doi.org/10.1021/acs.jcim.1c00537

    Article  Google Scholar 

  22. Gilmer J, Schoenholz SS, Riley PF et al (2017) Neural message passing for quantum chemistry. In: International Conference on Machine Learning, PMLR, pp 1263–1272

  23. Yang K, Swanson K, Jin W et al (2019) Analyzing learned molecular representations for property prediction. J Chem Inf Model 59(8):3370–3388. https://doi.org/10.1021/acs.jcim.9b00237

    Article  Google Scholar 

  24. Song Y, Zheng S, Niu Z et al (2020) Communicative representation learning on attributed molecular graphs. In: International Joint Conference on Artificial Intelligence, pp 2831–2838, https://doi.org/10.24963/ijcai.2020/392

  25. Hendrycks D, Gimpel K (2016) Gaussian error linear units (gelus). https://doi.org/10.48550/arxiv.1606.08415

  26. Cho K, Van Merriënboer B, Bahdanau D et al (2014) Learning phrase representations using rnn encoder-decoder for statistical machine translation. In: Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp 1724–1734

  27. Ying C, Cai T, Luo S et al (2021) Do transformers really perform badly for graph representation? Adv Neural Inf Process Syst 34:28877–28888

    Google Scholar 

  28. Dauphin YN, Fan A, Auli M et al (2017) Language modeling with gated convolutional networks. In: International Conference on Machine Learning, PMLR, pp 933–941

  29. Zhang B, Sennrich R (2019) Root mean square layer normalization. Advances in Neural Information Processing Systems 32

  30. Wang H, Ma S, Dong L et al (2022) Deepnet: Scaling transformers to 1,000 layers. https://doi.org/10.48550/arxiv.2203.00555

  31. Su J, Lu Y, Pan S et al (2021) Roformer: Enhanced transformer with rotary position embedding. https://doi.org/10.48550/arxiv.2104.09864

  32. Lin TY, Goyal P, Girshick R et al (2017) Focal loss for dense object detection. In: Proceedings of the IEEE International Conference on Computer Vision, pp 2980–2988, https://doi.org/10.1109/iccv.2017.324

  33. Szegedy C, Vanhoucke V, Ioffe S et al (2016) Rethinking the inception architecture for computer vision. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 2818–2826

  34. Müller R, Kornblith S, Hinton GE (2019) When does label smoothing help? Advances in Neural Information Processing Systems 32

  35. Klein G, Kim Y, Deng Y et al (2017) Opennmt: Open-source toolkit for neural machine translation. In: Proceedings of ACL 2017, System Demonstrations, pp 67–72

  36. Wolf T, Debut L, Sanh V et al (2020) Transformers: State-of-the-art natural language processing. In: Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: System Demonstrations, pp 38–45, https://doi.org/10.18653/v1/2020.emnlp-demos.6

  37. Lowe DM (2012) Extraction of chemical structures and reactions from the literature. PhD thesis, University of Cambridge, 10.17863/CAM.16293

  38. Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. J Chem Inf Model 56(12):2336–2346. https://doi.org/10.1021/acs.jcim.6b00564

    Article  Google Scholar 

  39. Landrum G (2022) Rdkit: Open-source cheminformatics software. https://rdkit.org/

  40. Jin W, Coley CW, Barzilay R et al (2017) Predicting organic reaction outcomes with weisfeiler-lehman network. Advances in Neural Information Processing Systems 30

  41. Loshchilov I, Hutter F (2017) Decoupled weight decay regularization. https://doi.org/10.48550/arxiv.1711.05101

  42. Sun R, Dai H, Li L et al (2021) Towards understanding retrosynthesis by energy-based models. Adv Neural Inf Process Syst 34:10186–10194

    Google Scholar 

  43. Irwin R, Dimitriadis S, He J et al (2022) Chemformer: A pre-trained transformer for computational chemistry. Machine Learning: Science and Technology 3(1):015022. https://doi.org/10.1088/2632-2153/ac3ffb

    Article  Google Scholar 

  44. Wang X, Li Y, Qiu J et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chem Eng J 420:129845. https://doi.org/10.1016/j.cej.2021.129845

  45. Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nat Commun 14(1):3009. https://doi.org/10.1038/s41467-023-38851-5

  46. Seo SW, Song YY, Yang JY et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539, 10.1609/aaai.v35i1.16131

  47. ASKCOS (2022) Askcos: Software tools for organic synthesis. https://askcos.mit.edu/

  48. Coley CW, Rogers L, Green WH et al (2018) Scscore: Synthetic complexity learned from a reaction corpus. J Chem Inf Model 58(2):252–261. https://doi.org/10.1021/acs.jcim.7b00622

    Article  Google Scholar 

Download references

Acknowledgements

This work was supported by the National Natural Science Foundation of China (No.61976247) and the Fundamental Research Funds for the Central Universities (No.2682023ZTPY057). The authors thank all members of the research team for their suggestions and contributions to the research ideas and directions of this work. We thank AutoDL cloud computing service platform for providing GPUs rental service. Finally, the authors thank Jianlin Su for his analysis and explanation of the mathematical theory related to Transformer and various language models on the blog.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yongquan Jiang.

Ethics declarations

Conflict of Interests

The authors declare that they have no conflicts of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendices

Appendix A: Feature and parameter settings

Table 10 summarizes the atom and bond features used in BiG2S, the majority of them are adapted from Graph2SMILES [15]. To support the dual-task capability of the model, additional task labels and optional reaction type information have been added to the atom features.

Table 10 Atom and bond features used in model
Table 11 Initialization for BiG2S

BiG2S introduces a special initialization method with the normalization of each layer in Transformer which are based on DeepNet [30]. Since the output of GLU [28] is significantly smaller in variance than the vanilla FFN [17] when the input data has the same distribution (e.g., standard normal distribution), BiG2S performs different initializations for each of the two types of structures above. The initialization and normalization methods are shown in Tables 11 and 12, where \(\textbf{W}_{query}\), \(\textbf{W}_{key}\), \(\textbf{W}_{value}\), and \(\textbf{W}_{out}\) denote the learnable weights of query, key, value, and the final output in attention layer; \(\textbf{W}_{FFN}\) and \(\textbf{W}_{GLU}\) denote the learnable weights of all linear layers in these two structures; N and M correspond to the number of layers of encoder and decoder, respectively.

Table 12 Normalization for BiG2S

Additionally, the hyperparameter settings for each dataset are shown in Table 13, where the primary batch loading type for BiG2S is the number of chemical reactions, with additional reactants and product token counts to constrain the batch loading due to the extremely long reactions in USPTO-full.

Table 13 Hyperparameters setting for each dataset

Appendix B: Statistical results during training and visualization results in dual-task inference

The statistical results for the training accuracy of each molecular structure and the performance variance on the validation set throughout the training are shown in Figs. 4 and 5.

Fig. 4
figure 4

The change of accuracy and SMILES invalid rate on the USPTO-50k validation set during model training. The vertical axis on the right side is represented by a logarithmic axis from \(0\%\) to \(10\%\) in order to display the invalid rate of the top-1 result

Fig. 5
figure 5

The total number of occurrences of all SMILES tokens in the USPTO-50k training set and the predicting accuracy in the training periods. The blue bars inside indicate the number of token occurrences, and the red dashed line indicates the prediction accuracy of each token

Fig. 6
figure 6

Top-5 results for additional dual-task prediction of reactions in USPTO-50k test set from BiG2S which is trained using USPTO-full. Ethane indicates that the prediction SMILES is invalid

Fig. 7
figure 7

More results for the additional dual-task prediction. Ethane indicates that the prediction SMILES is invalid

We additionally perform reaction outcome prediction for the products and retrosynthesis for the reactants, as well as the results evaluation and SCScore [48] obtained from ASKCOS [47], the visualization results are shown in Figs. 6 and 7. Notice that when evaluating the retrosynthesis results of reactants, the result will be marked as “Highly ranked in ASKCOS” if its corresponding reactants (as the prediction results in ASKCOS forward synthesis prediction) rank in the top-5 when the results are input into ASKCOS for forward synthesis prediction, which is different from the evaluation of retrosynthesis results of the product that it requires the product to be the top-1 result in the forward synthesis results from ASKCOS (Fig. 2). Since the training of BiG2S only contains data related to the retrosynthesis of products and the reaction outcome prediction for reactants, the quality of model outputs is significantly reduced when these two additional tasks are performed, especially when multiple molecules of reactants are analyzed in retrosynthesis task at the same time, the model can therefore hardly obtain reasonable prediction results.

From the retrosynthesis results of the reactants, it is noticeable that the model attempts to generate the corresponding prediction results for each input molecule (such as the decomposition of 1-bromobut-2-yne in Fig. 7), but it is limited by the fact that this task is not involved in the training, which results in a considerable part of the molecules being the same as inputs or directly taking the product of the original reaction as one of the results. The retrosynthesis results are more reasonable when each molecule in reactants is input separately, some of which can further achieve a high ranking in the evaluation of ASKCOS. When performing forward synthesis prediction on each product, the results depend mainly on the bias of the model for various reaction types, reaction centers, and functional groups during training due to the lack of constraints and guidelines from other reactants.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Hu, H., Jiang, Y., Yang, Y. et al. BiG2S: A dual task graph-to-sequence model for the end-to-end template-free reaction prediction. Appl Intell 53, 29620–29637 (2023). https://doi.org/10.1007/s10489-023-05048-8

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10489-023-05048-8

Keywords

Navigation