research-article

Enhanced Template-Free Reaction Prediction with Molecular Graphs and Sequence-based Data Augmentation

Authors:

Yongquan Jiang,

Jim X. ChenAuthors Info & Claims

CIKM '23: Proceedings of the 32nd ACM International Conference on Information and Knowledge Management

Pages 813 - 822

https://doi.org/10.1145/3583780.3614865

Published: 21 October 2023 Publication History

Abstract

Retrosynthesis and forward synthesis prediction are fundamental challenges in organic synthesis, computer-aided synthesis planning (CASP), and computer-aided drug design (CADD). The objective is to predict plausible reactants for a given target product and its corresponding inverse task. With the rapid development of deep learning, numerous approaches have been proposed to solve this problem from various perspectives. The methods based on molecular graphs benefit from their rich features embedded inside but face difficulties in applying existing sequence-based data augmentations due to the permutation invariance of graph structures. In this work, we propose SeqAGraph, a template-free approach that annotates input graphs with its root atom index to ensure compatibility with sequence-based data augmentation. The matrix product for global attention in graph encoders is implemented by indexing, elementwise product, and aggregation to fuse global attention with local message passing without graph padding. Experiments demonstrate that SeqAGraph fully benefits from molecular graphs and sequence-based data augmentation and achieves state-of-the-art accuracy in template-free approaches.

References

[1]

Jimmy Lei Ba, Jamie Ryan Kiros, and Geoffrey E Hinton. 2016. Layer Normalization. arXiv preprint arXiv:1607.06450 (2016).

[2]

David C Blakemore, Luis Castro, Ian Churcher, David C Rees, Andrew W Thomas, David M Wilson, and Anthony Wood. 2018. Organic Synthesis Provides Opportunities to Transform Drug Discovery. Nat. Chem., Vol. 10, 4 (2018), 383--394.

[3]

Shuan Chen and Yousung Jung. 2021. Deep Retrosynthetic Reaction Prediction Using Local Reactivity and Global Attention. JACS Au, Vol. 1, 10 (2021), 1612--1620.

[4]

Connor W Coley, William H Green, and Klavs F Jensen. 2018. Machine Learning in Computer-aided Synthesis Planning. Acc. Chem. Res., Vol. 51, 5 (2018), 1281--1289.

[5]

Connor W Coley, Luke Rogers, William H Green, and Klavs F Jensen. 2017. Computer-assisted Retrosynthesis Based on Molecular Similarity. ACS Cent. Sci., Vol. 3, 12 (2017), 1237--1245.

[6]

Hanjun Dai, Chengtao Li, Connor Coley, Bo Dai, and Le Song. 2019. Retrosynthesis Prediction with Conditional Graph Logic Network. Advances in Neural Information Processing Systems, Vol. 32 (2019).

[7]

Jiarui Feng, Yixin Chen, Fuhai Li, Anindya Sarkar, and Muhan Zhang. 2022. How Powerful are K-hop Message Passing Graph Neural Networks. In Advances in Neural Information Processing Systems, Vol. 35. 4776--4790.

[8]

Justin Gilmer, Samuel S Schoenholz, Patrick F Riley, Oriol Vinyals, and George E Dahl. 2017. Neural Message Passing for Quantum Chemistry. In International Conference on Machine Learning. PMLR, 1263--1272.

[9]

Hua-Rui He, Jie Wang, Yunfei Liu, and Feng Wu. 2022. Modeling Diverse Chemical Reactions for Single-Step Retrosynthesis via Discrete Latent Variables. In Proceedings of the 31st ACM International Conference on Information and Knowledge Management. Association for Computing Machinery, 717--726.

Digital Library

[10]

Ross Irwin, Spyridon Dimitriadis, Jiazhen He, and Esben Jannik Bjerrum. 2022. Chemformer: A Pre-trained Transformer for Computational Chemistry. Machine Learning: Science and Technology, Vol. 3, 1 (2022), 015022.

[11]

Wengong Jin, Connor Coley, Regina Barzilay, and Tommi Jaakkola. 2017. Predicting Organic Reaction Outcomes with Weisfeiler-Lehman Network. Advances in Neural Information Processing Systems, Vol. 30.

[12]

Pavel Karpov, Guillaume Godin, and Igor V Tetko. 2019. A Transformer Model for Retrosynthesis. In Artificial Neural Networks and Machine Learning -- ICANN 2019: Workshop and Special Sessions. Springer, 817--830.

[13]

Eunji Kim, Dongseon Lee, Youngchun Kwon, Min Sik Park, and Youn-Suk Choi. 2021. Valid, Plausible, and Diverse Retrosynthesis Using Tied Two-way Transformers with Latent Variables. J. Chem. Inf. Model., Vol. 61, 1 (2021), 123--133.

[14]

Greg Landrum et al. 2006. RDKit: Open-Source Cheminformatics Software. (2006).

[15]

Kangjie Lin, Youjun Xu, Jianfeng Pei, and Luhua Lai. 2020. Automatic Retrosynthetic Route Planning Using Template-free Models. Chem. Sci., Vol. 11, 12 (2020), 3355--3364.

[16]

Tsung-Yi Lin, Priya Goyal, Ross Girshick, Kaiming He, and Piotr Dollár. 2017. Focal Loss for Dense Object Detection. In Proceedings of the IEEE International Conference on Computer Vision. 2980--2988.

[17]

Zaiyun Lin, Shiqiu Yin, Lei Shi, Wenbiao Zhou, and Yingsheng John Zhang. 2023. G2GT: Retrosynthesis Prediction with Graph-to-Graph Attention Neural Network and Self-Training. J. Chem. Inf. Model., Vol. 63, 7 (2023), 1894--1905.

[18]

Bowen Liu, Bharath Ramsundar, Prasad Kawthekar, Jade Shi, Joseph Gomes, Quang Luu Nguyen, Stephen Ho, Jack Sloane, Paul Wender, and Vijay Pande. 2017. Retrosynthetic Reaction Prediction Using Neural Sequence-to-Sequence Models. ACS Cent. Sci., Vol. 3, 10 (2017), 1103--1113.

[19]

Ze Liu, Yutong Lin, Yue Cao, Han Hu, Yixuan Wei, Zheng Zhang, Stephen Lin, and Baining Guo. 2021. Swin Transformer: Hierarchical Vision Transformer using Shifted Windows. In Proceedings of the IEEE/CVF international conference on computer vision. 10012--10022.

[20]

Ilya Loshchilov and Frank Hutter. 2017. Decoupled Weight Decay Regularization. arXiv preprint arXiv:1711.05101 (2017).

[21]

Daniel Mark Lowe. 2012. Extraction of chemical structures and reactions from the literature. Ph.D. Dissertation. University of Cambridge.

[22]

David A Pensak and Elias James Corey. 1977. LHASA-Logic and Heuristics Applied to Synthetic Analysis. ACS Publications, Chapter 1, 1--32.

[23]

Mikołaj Sacha, Mikołaj Błaz, Piotr Byrski, Paweł Dabrowski-Tumanski, Mikołaj Chrominski, Rafał Loska, Paweł Włodarczyk-Pruszynski, and Stanisław Jastrzebski. 2021. Molecule Edit Graph Attention Network: Modeling Chemical Reactions as Sequences of Graph Edits. J. Chem. Inf. Model., Vol. 61, 7 (2021), 3273--3284.

[24]

Nadine Schneider, Nikolaus Stiefl, and Gregory A Landrum. 2016. What's What: The (nearly) Definitive Guide to Reaction Role Assignment. J. Chem. Inf. Model., Vol. 56, 12 (2016), 2336--2346.

[25]

Philippe Schwaller, Teodoro Laino, Théophile Gaudin, Peter Bolgar, Christopher A Hunter, Costas Bekas, and Alpha A Lee. 2019. Molecular Transformer: A Model for Uncertainty-calibrated Chemical Reaction Prediction. ACS Cent. Sci., Vol. 5, 9 (2019), 1572--1583.

[26]

Marwin HS Segler, Mike Preuss, and Mark P Waller. 2018. Planning Chemical Syntheses with Deep Neural Networks and Symbolic AI. Nature, Vol. 555, 7698 (2018), 604--610.

[27]

Marwin HS Segler and Mark P Waller. 2017. Neural-symbolic Machine Learning for Retrosynthesis and Reaction Prediction. Chem. -- Eur. J., Vol. 23, 25 (2017), 5966--5971.

[28]

Seung-Woo Seo, You Young Song, June Yong Yang, Seohui Bae, Hankook Lee, Jinwoo Shin, Sung Ju Hwang, and Eunho Yang. 2021. GTA: Graph Truncated Attention for Retrosynthesis. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 35. 531--539.

[29]

Chence Shi, Minkai Xu, Hongyu Guo, Ming Zhang, and Jian Tang. 2020. A Graph to Graphs Framework for Retrosynthesis Prediction. In International Conference on Machine Learning. PMLR, 8818--8827.

[30]

Vignesh Ram Somnath, Charlotte Bunne, Connor Coley, Andreas Krause, and Regina Barzilay. 2021. Learning Graph Models for Retrosynthesis Prediction. Advances in Neural Information Processing Systems, Vol. 34 (2021), 9405--9415.

[31]

Jianlin Su, Yu Lu, Shengfeng Pan, Ahmed Murtadha, Bo Wen, and Yunfeng Liu. 2021. Roformer: Enhanced Transformer with Rotary Position Embedding. arXiv preprint arXiv:2104.09864 (2021).

[32]

Ruoxi Sun, Hanjun Dai, Li Li, Steven Kearnes, and Bo Dai. 2021. Towards Understanding Retrosynthesis by Energy-based Models. In Advances in Neural Information Processing Systems, Vol. 34. 10186--10194.

[33]

Christian Szegedy, Vincent Vanhoucke, Sergey Ioffe, Jon Shlens, and Zbigniew Wojna. 2016. Rethinking the Inception Architecture for Computer Vision. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2818--2826.

[34]

Igor V Tetko, Pavel Karpov, Ruud Van Deursen, and Guillaume Godin. 2020. State-of-the-art Augmented NLP Transformer Models for Direct and Single-step Retrosynthesis. Nat. Commun., Vol. 11, 1 (2020), 1--11.

[35]

Zhengkai Tu and Connor W Coley. 2022. Permutation Invariant Graph-to-sequence Model for Template-free Retrosynthesis and Reaction Prediction. J. Chem. Inf. Model., Vol. 62, 15 (2022), 3503--3513.

[36]

Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Łukasz Kaiser, and Illia Polosukhin. 2017. Attention is ALL You Need. In Advances in Neural Information Processing Systems, Vol. 30.

[37]

Yue Wan, Chang-Yu Hsieh, Ben Liao, and Shengyu Zhang. 2022. Retroformer: Pushing the Limits of End-to-end Retrosynthesis Transformer. In International Conference on Machine Learning. PMLR, 22475--22490.

[38]

Hongyu Wang, Shuming Ma, Li Dong, Shaohan Huang, Dongdong Zhang, and Furu Wei. 2022. Deepnet: Scaling Transformers to 1,000 Layers. arXiv preprint arXiv:2203.00555 (2022).

[39]

Xiaorui Wang, Yuquan Li, Jiezhong Qiu, Guangyong Chen, Huanxiang Liu, Benben Liao, Chang-Yu Hsieh, and Xiaojun Yao. 2021. Retroprime: A Diverse, Plausible and Transformer-based Method for Single-step Retrosynthesis Predictions. Chem. Eng. J., Vol. 420 (2021), 129845.

[40]

David Weininger. 1988. SMILES, A Chemical Language and Information System. 1. Introduction to Methodology and Encoding Rules. J. Chem. Inf. Comput. Sci., Vol. 28, 1 (1988), 31--36.

Digital Library

[41]

Chaochao Yan, Qianggang Ding, Peilin Zhao, Shuangjia Zheng, Jinyu Yang, Yang Yu, and Junzhou Huang. 2020. Retroxpert: Decompose Retrosynthesis Prediction Like a Chemist. Advances in Neural Information Processing Systems, Vol. 33 (2020), 11248--11258.

[42]

Kevin Yang, Kyle Swanson, Wengong Jin, Connor Coley, Philipp Eiden, Hua Gao, Angel Guzman-Perez, Timothy Hopper, Brian Kelley, Miriam Mathea, et al. 2019. Analyzing Learned Molecular Representations for Property Prediction. J. Chem. Inf. Model., Vol. 59, 8 (2019), 3370--3388.

[43]

Chengxuan Ying, Tianle Cai, Shengjie Luo, Shuxin Zheng, Guolin Ke, Di He, Yanming Shen, and Tie-Yan Liu. 2021. Do Transformers Really Perform Badly for Graph Representation?. In Advances in Neural Information Processing Systems, Vol. 34. 28877--28888.

[44]

Biao Zhang and Rico Sennrich. 2019. Root Mean Square Layer Normalization. In Advances in Neural Information Processing Systems, Vol. 32.

[45]

Shuangjia Zheng, Jiahua Rao, Zhongyue Zhang, Jun Xu, and Yuedong Yang. 2019. Predicting Retrosynthetic Reactions Using Self-Corrected Transformer Neural Networks. J. Chem. Inf. Model., Vol. 60, 1 (2019), 47--55.

Cited By

Luong KSingh A(2024)Application of Transformers in CheminformaticsJournal of Chemical Information and Modeling10.1021/acs.jcim.3c0207064:11(4392-4409)Online publication date: 30-May-2024
https://doi.org/10.1021/acs.jcim.3c02070

Index Terms

Enhanced Template-Free Reaction Prediction with Molecular Graphs and Sequence-based Data Augmentation
1. Applied computing
  1. Life and medical sciences
    1. Computational biology
  2. Physical sciences and engineering
    1. Chemistry
2. Computing methodologies
  1. Artificial intelligence
    1. Natural language processing

Recommendations

Modeling Diverse Chemical Reactions for Single-step Retrosynthesis via Discrete Latent Variables
CIKM '22: Proceedings of the 31st ACM International Conference on Information & Knowledge Management

Single-step retrosynthesis is the cornerstone of retrosynthesis planning, which is a crucial task for computer-aided drug discovery. The goal of single-step retrosynthesis is to identify the possible reactants that lead to the synthesis of the target ...
BiG2S: A dual task graph-to-sequence model for the end-to-end template-free reaction prediction
Abstract
Retrosynthesis and reaction outcome prediction are fundamental problems in organic chemistry and computer-aided synthesis planning (CASP), which are also crucial parts of computer-aided drug design. In recent years, deep learning has spawned a ...
Molecular graph enhanced transformer for retrosynthesis prediction
Abstract
With massive possible synthetic routes in chemistry, retrosynthesis prediction is still a challenge for researchers. Recently, retrosynthesis prediction is formulated as a Machine Translation (MT) task. Namely, since each molecule can ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

CIKM '23: Proceedings of the 32nd ACM International Conference on Information and Knowledge Management

October 2023

5508 pages

ISBN:9798400701245

DOI:10.1145/3583780

General Chairs:
Ingo Frommholz
University of Wolverhampton, UK
,
Frank Hopfgartner
University of Koblenz, Germany
,
Mark Lee
University of Birmingham, UK
,
Michael Oakes
University of Birmingham, UK
,
Program Chairs:
Mounia Lalmas
Spotify, UK
,
Min Zhang
Tsinghua University, China
,
Rodrygo Santos
Federal University of Minas Gerais, Brazil

Copyright © 2023 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 21 October 2023

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Funding Sources

Conference

CIKM '23

Sponsor:

CIKM '23: The 32nd ACM International Conference on Information and Knowledge Management

October 21 - 25, 2023

Birmingham, United Kingdom

Acceptance Rates

Overall Acceptance Rate 1,861 of 8,427 submissions, 22%

Upcoming Conference

CIKM '25

Sponsor:
sigir
sigir

The 34th ACM International Conference on Information and Knowledge Management

November 10 - 14, 2025

Seoul , Republic of Korea

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

1
Total Citations
View Citations
250
Total Downloads

Downloads (Last 12 months)95
Downloads (Last 6 weeks)3

Reflects downloads up to 01 Mar 2025

Other Metrics

View Author Metrics

Citations

Cited By

Luong KSingh A(2024)Application of Transformers in CheminformaticsJournal of Chemical Information and Modeling10.1021/acs.jcim.3c0207064:11(4392-4409)Online publication date: 30-May-2024
https://doi.org/10.1021/acs.jcim.3c02070

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten