ABSTRACT
Molecular pre-training, which is about to learn an effective representation for molecules on large amount of data, has attracted substantial attention in cheminformatics and bioinformatics. A molecule can be viewed as either a graph (where atoms are connected by bonds) or a SMILES sequence (where depth-first-search is applied to the molecular graph with specific rules). The Transformer and graph neural networks (GNN) are two representative methods to deal with the sequential data and the graphic data, which can globally and locally model the molecules respectively and are supposed to be complementary. In this work, we propose to leverage both representations and design a new pre-training algorithm, dual-view molecule pre-training (briefly, DVMP), that can effectively combine the strengths of both types of molecule representations. DVMP has a Transformer branch and a GNN branch, and the two branches are pre-trained to maintain the semantic consistency of molecules. After pre-training, we can use either the Transformer branch (this one is recommended according to empirical results), the GNN branch, or both for downstream tasks. DVMP is tested on 11 molecular property prediction tasks and outperforms strong baselines. Furthermore, we test DVMP on three retrosynthesis tasks and it achieves state-of-the-art results. Our code is released at https://github.com/microsoft/DVMP.
Supplemental Material
- Avrim Blum and Tom Mitchell. 1998. Combining labeled and unlabeled data with co-training. In Proceedings of the eleventh annual conference on Computational learning theory. 92--100.Google ScholarDigital Library
- Shuan Chen and Yousung Jung. 2021. Deep Retrosynthetic Reaction Prediction using Local Reactivity and Global Attention. JACS Au, Vol. 1, 10 (2021), 1612--1620. https://doi.org/10.1021/jacsau.1c00246 PMID: 34723264.Google ScholarCross Ref
- Ting Chen, Simon Kornblith, Mohammad Norouzi, and Geoffrey Hinton. 2020. A simple framework for contrastive learning of visual representations. In International conference on machine learning. PMLR, 1597--1607.Google Scholar
- Seyone Chithrananda, Gabe Grand, and Bharath Ramsundar. 2020. ChemBERTa: Large-Scale Self-Supervised Pretraining for Molecular Property Prediction. arXiv preprint arXiv:2010.09885 (2020).Google Scholar
- Connor W Coley, Luke Rogers, William H Green, and Klavs F Jensen. 2017. Computer-assisted retrosynthesis based on molecular similarity. ACS central science, Vol. 3, 12 (2017), 1237--1245.Google Scholar
- Corinna Cortes and Vladimir Vapnik. 1995. Support-vector networks. Machine learning, Vol. 20, 3 (1995), 273--297.Google Scholar
- Hanjun Dai, Chengtao Li, Connor W Coley, Bo Dai, and Le Song. 2019. Retrosynthesis prediction with conditional graph logic network. In Advances in Neural Information Processing Systems, H. Wallach, H. Larochelle, A. Beygelzimer, F. dtextquotesingle Alché-Buc, E. Fox, and R. Garnett (Eds.), Vol. 32. Curran Associates, Inc.Google Scholar
- Laurianne David, Amol Thakkar, Rocío Mercado, and Ola Engkvist. 2020. Molecular representations in AI-driven drug discovery: a review and practical guide. Journal of Cheminformatics, Vol. 12, 1 (2020), 1--22.Google ScholarCross Ref
- David L Davies and Donald W Bouldin. 1979. A cluster separation measure. IEEE transactions on pattern analysis and machine intelligence 2 (1979), 224--227.Google ScholarDigital Library
- Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2018. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805 (2018).Google Scholar
- Linhao Dong, Shuang Xu, and Bo Xu. 2018. Speech-Transformer: A No-Recurrence Sequence-to-Sequence Model for Speech Recognition. In 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). 5884--5888. https://doi.org/10.1109/ICASSP.2018.8462506Google ScholarDigital Library
- Alexey Dosovitskiy, Lucas Beyer, Alexander Kolesnikov, Dirk Weissenborn, Xiaohua Zhai, Thomas Unterthiner, Mostafa Dehghani, Matthias Minderer, Georg Heigold, Sylvain Gelly, Jakob Uszkoreit, and Neil Houlsby. 2021. An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale. In International Conference on Learning Representations. https://openreview.net/forum?id=YicbFdNTTyGoogle Scholar
- Benedek Fabian, Thomas Edlich, Héléna Gaspar, Marwin Segler, Joshua Meyers, Marco Fiscato, and Mohamed Ahmed. 2020a. Molecular representation learning with language models and domain-relevant auxiliary tasks. arXiv preprint arXiv:2011.13230 (2020).Google Scholar
- Benedek Fabian, Thomas Edlich, Héléna Gaspar, Marwin Segler, Joshua Meyers, Marco Fiscato, and Mohamed Ahmed. 2020b. Molecular representation learning with language models and domain-relevant auxiliary tasks. In Machine Learning for Molecules Workshop at NeurIPS 2020.Google Scholar
- Xiaomin Fang, Lihang Liu, Jieqiong Lei, Donglong He, Shanzhuo Zhang, Jingbo Zhou, Fan Wang, Hua Wu, and Haifeng Wang. 2022. Geometry-enhanced molecular representation learning for property prediction. Nature Machine Intelligence, Vol. 4, 2 (01 Feb 2022), 127--134. https://doi.org/10.1038/s42256-021-00438-4Google ScholarCross Ref
- Jean-Bastien Grill, Florian Strub, Florent Altché, Corentin Tallec, Pierre Richemond, Elena Buchatskaya, Carl Doersch, Bernardo Avila Pires, Zhaohan Guo, Mohammad Gheshlaghi Azar, Bilal Piot, koray kavukcuoglu, Remi Munos, and Michal Valko. 2020a. Bootstrap Your Own Latent - A New Approach to Self-Supervised Learning. In Advances in Neural Information Processing Systems, H. Larochelle, M. Ranzato, R. Hadsell, M. F. Balcan, and H. Lin (Eds.), Vol. 33. Curran Associates, Inc., 21271--21284. https://proceedings.neurips.cc/paper/2020/file/f3ada80d5c4ee70142b17b8192b2958e-Paper.pdfGoogle Scholar
- Jean-Bastien Grill, Florian Strub, Florent Altché , Corentin Tallec, Pierre H Richemond, Elena Buchatskaya, Carl Doersch, Bernardo Avila Pires, Zhaohan Daniel Guo, Mohammad Gheshlaghi Azar, et al. 2020b. Bootstrap your own latent: A new approach to self-supervised learning. arXiv preprint arXiv:2006.07733 (2020).Google Scholar
- R. Hadsell, S. Chopra, and Y. LeCun. 2006. Dimensionality Reduction by Learning an Invariant Mapping. In 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06), Vol. 2. 1735--1742. https://doi.org/10.1109/CVPR.2006.100Google ScholarDigital Library
- Kaveh Hassani and Amir Hosein Khasahmadi. 2020. Contrastive multi-view representation learning on graphs. In International Conference on Machine Learning. PMLR, 4116--4126.Google Scholar
- Jiazhen He, Felix Mattsson, Marcus Forsberg, Esben Jannik Bjerrum, Ola Engkvist, Christian Tyrchan, Werngard Czechtizky, et al. 2021. Transformer Neural Network for Structure Constrained Molecular Optimization. (2021).Google Scholar
- Jiyan He, Keyu Tian, Shengjie Luo, Yaosen Min, Shuxin Zheng, Yu Shi, Di He, Haiguang Liu, Nenghai Yu, Liwei Wang, Ji Wu, and Tie-Yan Liu. 2022. Masked Molecule Modeling: A New Paradigm of Molecular Representation Learning for Chemistry Understanding. https://doi.org/10.21203/rs.3.rs-1746019/v1Google Scholar
- Kaiming He, Haoqi Fan, Yuxin Wu, Saining Xie, and Ross Girshick. 2020. Momentum contrast for unsupervised visual representation learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 9729--9738.Google ScholarCross Ref
- Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition. 770--778.Google ScholarCross Ref
- Tin Kam Ho. 1995. Random decision forests. In Proceedings of 3rd international conference on document analysis and recognition, Vol. 1. IEEE, 278--282.Google ScholarDigital Library
- Shion Honda, Shoi Shi, and Hiroki R Ueda. 2019. SMILES transformer: pre-trained molecular fingerprint for low data drug discovery. arXiv preprint arXiv:1911.04738 (2019).Google Scholar
- Weihua Hu, Bowen Liu, Joseph Gomes, Marinka Zitnik, Percy Liang, Vijay Pande, and Jure Leskovec. 2020. Strategies for Pre-training Graph Neural Networks. In International Conference on Learning Representations. https://openreview.net/forum?id=HJlWWJSFDHGoogle Scholar
- Ye Hu, Dagmar Stumpfe, and Jürgen Bajorath. 2016. Computational Exploration of Molecular Scaffolds in Medicinal Chemistry. Journal of Medicinal Chemistry, Vol. 59, 9 (2016), 4062--4076. https://doi.org/10.1021/acs.jmedchem.5b01746 PMID: 26840095.Google ScholarCross Ref
- Bing Huang and O Anatole Von Lilienfeld. 2016. Communication: Understanding molecular representations in machine learning: The role of uniqueness and target similarity.Google ScholarCross Ref
- Sergey Ioffe and Christian Szegedy. 2015. Batch normalization: Accelerating deep network training by reducing internal covariate shift. In International conference on machine learning. PMLR, 448--456.Google Scholar
- Diederik P Kingma and Jimmy Ba. 2014. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014).Google Scholar
- Thomas N Kipf and Max Welling. 2016. Semi-supervised classification with graph convolutional networks. arXiv preprint arXiv:1609.02907 (2016).Google Scholar
- Greg Landrum. 2013. RDKit: A software suite for cheminformatics, computational chemistry, and predictive modeling.Google Scholar
- Guohao Li, Chenxin Xiong, Ali Thabet, and Bernard Ghanem. 2020b. Deepergcn: All you need to train deeper gcns. arXiv preprint arXiv:2006.07739 (2020).Google Scholar
- Pengyong Li, Jun Wang, Yixuan Qiao, Hao Chen, Yihuan Yu, Xiaojun Yao, Peng Gao, Guotong Xie, and Sen Song. 2020a. Learn molecular representations from large-scale unlabeled molecules for drug discovery. arXiv preprint arXiv:2012.11175 (2020).Google Scholar
- Yi-Lun Liao and Tess Smidt. 2023. Equiformer: Equivariant Graph Attention Transformer for 3D Atomistic Graphs. In The Eleventh International Conference on Learning Representations. https://openreview.net/forum?id=KwmPfARgOTDGoogle Scholar
- Bowen Liu, Bharath Ramsundar, Prasad Kawthekar, Jade Shi, Joseph Gomes, Quang Luu Nguyen, Stephen Ho, Jack Sloane, Paul Wender, and Vijay Pande. 2017. Retrosynthetic reaction prediction using neural sequence-to-sequence models. ACS central science, Vol. 3, 10 (2017), 1103--1113.Google Scholar
- Shengchao Liu, Mehmet F Demirel, and Yingyu Liang. 2019a. N-Gram Graph: Simple Unsupervised Representation for Graphs, with Applications to Molecules. In Advances in Neural Information Processing Systems, H. Wallach, H. Larochelle, A. Beygelzimer, F. dtextquotesingle Alché-Buc, E. Fox, and R. Garnett (Eds.), Vol. 32. Curran Associates, Inc. https://proceedings.neurips.cc/paper/2019/file/2f3926f0a9613f3c3cc21d52a3cdb4d9-Paper.pdfGoogle Scholar
- Shengchao Liu, Hanchen Wang, Weiyang Liu, Joan Lasenby, Hongyu Guo, and Jian Tang. 2022. Pre-training Molecular Graph Representation with 3D Geometry. In International Conference on Learning Representations. https://openreview.net/forum?id=xQUe1pOKPamGoogle Scholar
- Yinhan Liu, Myle Ott, Naman Goyal, Jingfei Du, Mandar Joshi, Danqi Chen, Omer Levy, Mike Lewis, Luke Zettlemoyer, and Veselin Stoyanov. 2019b. Roberta: A robustly optimized bert pretraining approach. arXiv preprint arXiv:1907.11692 (2019).Google Scholar
- Zhengying Liu, Adrien Pavao, Zhen Xu, Sergio Escalera, Fabio Ferreira, Isabelle Guyon, Sirui Hong, Frank Hutter, Rongrong Ji, Julio C Junior, et al. 2021. Winning solutions and post-challenge analyses of the ChaLearn AutoDL challenge 2019. IEEE Transactions on Pattern Analysis and Machine Intelligence (2021).Google Scholar
- Hehuan Ma, Yatao Bian, Yu Rong, Wenbing Huang, Tingyang Xu, Weiyang Xie, Geyan Ye, and Junzhou Huang. 2020. Multi-view graph neural networks for molecular property prediction. arXiv preprint arXiv:2005.13607 (2020).Google Scholar
- Łukasz Maziarka, Tomasz Danel, Sławomir Mucha, Krzysztof Rataj, Jacek Tabor, and Stanisław Jastrzebski. 2020. Molecule attention transformer. arXiv preprint arXiv:2002.08264 (2020).Google Scholar
- Tomas Mikolov, Kai Chen, Greg Corrado, and Jeffrey Dean. 2013. Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781 (2013).Google Scholar
- Niki Parmar, Ashish Vaswani, Jakob Uszkoreit, Lukasz Kaiser, Noam Shazeer, Alexander Ku, and Dustin Tran. 2018. Image Transformer. In Proceedings of the 35th International Conference on Machine Learning (Proceedings of Machine Learning Research, Vol. 80), Jennifer Dy and Andreas Krause (Eds.). PMLR, 4055--4064. http://proceedings.mlr.press/v80/parmar18a.htmlGoogle Scholar
- Giorgio Pesciullesi, Philippe Schwaller, Teodoro Laino, and Jean-Louis Reymond. 2020. Transfer learning enables the molecular transformer to predict regio-and stereoselective reactions on carbohydrates. Nature communications, Vol. 11, 1 (2020), 1--8.Google Scholar
- Ladislav Rampávs ek, Michael Galkin, Vijay Prakash Dwivedi, Anh Tuan Luu, Guy Wolf, and Dominique Beaini. 2022. Recipe for a general, powerful, scalable graph transformer. Advances in Neural Information Processing Systems, Vol. 35 (2022), 14501--14515.Google Scholar
- Bharath Ramsundar, Peter Eastman, Patrick Walters, and Vijay Pande. 2019. Deep learning for the life sciences: applying deep learning to genomics, microscopy, drug discovery, and more. O'Reilly Media.Google Scholar
- Roshan Rao, Nicholas Bhattacharya, Neil Thomas, Yan Duan, Xi Chen, John Canny, Pieter Abbeel, and Yun S Song. 2019. Evaluating protein transfer learning with tape. Advances in Neural Information Processing Systems, Vol. 32 (2019), 9689.Google Scholar
- Yi Ren, Yangjun Ruan, Xu Tan, Tao Qin, Sheng Zhao, Zhou Zhao, and Tie-Yan Liu. 2019. FastSpeech: Fast, Robust and Controllable Text to Speech. In Advances in Neural Information Processing Systems, H. Wallach, H. Larochelle, A. Beygelzimer, F. dtextquotesingle Alché-Buc, E. Fox, and R. Garnett (Eds.), Vol. 32. https://proceedings.neurips.cc/paper/2019/file/f63f65b503e22cb970527f23c9ad7db1-Paper.pdfGoogle Scholar
- Alexander Rives, Joshua Meier, Tom Sercu, Siddharth Goyal, Zeming Lin, Jason Liu, Demi Guo, Myle Ott, C Lawrence Zitnick, Jerry Ma, et al. 2020. Biological structure and function emerge from scaling unsupervised learning to 250 million protein sequences. bioRxiv (2020), 622803.Google Scholar
- Yu Rong, Yatao Bian, Tingyang Xu, Weiyang Xie, Ying Wei, Wenbing Huang, and Junzhou Huang. 2020a. Self-supervised graph transformer on large-scale molecular data. Advances in Neural Information Processing Systems, Vol. 33 (2020), 12559--12571.Google Scholar
- Yu Rong, Yatao Bian, Tingyang Xu, Weiyang Xie, Ying WEI, Wenbing Huang, and Junzhou Huang. 2020b. Self-Supervised Graph Transformer on Large-Scale Molecular Data. In NeurIPS, Vol. 33. 12559--12571.Google Scholar
- Philippe Schwaller, Teodoro Laino, Théophile Gaudin, Peter Bolgar, Christopher A Hunter, Costas Bekas, and Alpha A Lee. 2019. Molecular transformer: a model for uncertainty-calibrated chemical reaction prediction. ACS central science, Vol. 5, 9 (2019), 1572--1583.Google Scholar
- Seung-Woo Seo, You Young Song, June Yong Yang, Seohui Bae, Hankook Lee, Jinwoo Shin, Sung Ju Hwang, and Eunho Yang. 2021. GTA: Graph Truncated Attention for Retrosynthesis. Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 35, 1 (May 2021), 531--539. https://doi.org/10.1609/aaai.v35i1.16131Google ScholarCross Ref
- Xiaoke Shen, Yang Liu, You Wu, and Lei Xie. 2020. MoLGNN: Self-Supervised Motif Learning Graph Neural Network for Drug Discovery. Machine Learning for Molecules Workshop at NeurIPS 2020 (2020). https://ml4molecules.github.io/papers2020/ML4Molecules_2020_paper_4.pdfGoogle Scholar
- Chence Shi, Minkai Xu, Hongyu Guo, Ming Zhang, and Jian Tang. 2020. A graph to graphs framework for retrosynthesis prediction. In International Conference on Machine Learning. PMLR, 8818--8827.Google Scholar
- Hannes Stärk, Dominique Beaini, Gabriele Corso, Prudencio Tossou, Christian Dallago, Stephan Günnemann, and Pietro Liò. 2021. 3D Infomax improves GNNs for Molecular Property Prediction. arXiv preprint arXiv:2110.04126 (2021).Google Scholar
- Ruoxi Sun, Hanjun Dai, Li Li, Steven Kearnes, and Bo Dai. 2021. Towards understanding retrosynthesis by energy-based models. Advances in Neural Information Processing Systems, Vol. 34 (2021), 10186--10194.Google Scholar
- Igor V. Tetko, Pavel Karpov, Ruud Van Deursen, and Guillaume Godin. 2020. State-of-the-art augmented NLP transformer models for direct and single-step retrosynthesis. Nature Communications, Vol. 11, 1 (04 Nov 2020), 5575. https://doi.org/10.1038/s41467-020-19266-yGoogle Scholar
- Zhengkai Tu and Connor W. Coley. 2022. Permutation Invariant Graph-to-Sequence Model for Template-Free Retrosynthesis and Reaction Prediction. Journal of Chemical Information and Modeling, Vol. 62, 15 (2022), 3503--3513. https://doi.org/10.1021/acs.jcim.2c00321Google ScholarCross Ref
- Laurens van der Maaten and Geoffrey Hinton. 2008. Visualizing Data using t-SNE. Journal of Machine Learning Research, Vol. 9 (2008), 2579--2605.Google Scholar
- Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Lukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. arXiv preprint arXiv:1706.03762 (2017).Google Scholar
- Sheng Wang, Yuzhi Guo, Yuhong Wang, Hongmao Sun, and Junzhou Huang. 2019. SMILES-BERT: large scale unsupervised pre-training for molecular property prediction. In Proceedings of the 10th ACM international conference on bioinformatics, computational biology and health informatics. 429--436.Google ScholarDigital Library
- Yuyang Wang, Jianren Wang, Zhonglin Cao, and Amir Barati Farimani. 2022. Molecular contrastive learning of representations via graph neural networks. Nature Machine Intelligence, Vol. 4, 3 (01 Mar 2022), 279--287. https://doi.org/10.1038/s42256-022-00447-xGoogle ScholarCross Ref
- David Weininger. 1988. SMILES, a chemical language and information system. 1. Introduction to methodology and encoding rules. Journal of chemical information and computer sciences, Vol. 28, 1 (1988), 31--36.Google ScholarDigital Library
- Zhenqin Wu, Bharath Ramsundar, Evan N Feinberg, Joseph Gomes, Caleb Geniesse, Aneesh S Pappu, Karl Leswing, and Vijay Pande. 2018. MoleculeNet: a benchmark for molecular machine learning. Chemical science, Vol. 9, 2 (2018), 513--530.Google Scholar
- Chaochao Yan, Qianggang Ding, Peilin Zhao, Shuangjia Zheng, JINYU YANG, Yang Yu, and Junzhou Huang. 2020. RetroXpert: Decompose Retrosynthesis Prediction Like A Chemist. In Advances in Neural Information Processing Systems, H. Larochelle, M. Ranzato, R. Hadsell, M. F. Balcan, and H. Lin (Eds.), Vol. 33. Curran Associates, Inc., 11248--11258. https://proceedings.neurips.cc/paper/2020/file/819f46e52c25763a55cc642422644317-Paper.pdfGoogle Scholar
- Kevin Yang, Kyle Swanson, Wengong Jin, Connor Coley, Philipp Eiden, Hua Gao, Angel Guzman-Perez, Timothy Hopper, Brian Kelley, Miriam Mathea, et al. 2019. Analyzing learned molecular representations for property prediction. Journal of chemical information and modeling, Vol. 59, 8 (2019), 3370--3388.Google ScholarCross Ref
- Yuning You, Tianlong Chen, Yongduo Sui, Ting Chen, Zhangyang Wang, and Yang Shen. 2020. Graph Contrastive Learning with Augmentations. In NeurIPS.Google Scholar
- Zipeng Zhong, Jie Song, Zunlei Feng, Tiantao Liu, Lingxiang Jia, Shaolun Yao, Min Wu, Tingjun Hou, and Mingli Song. 2022. Root-aligned SMILES: a tight representation for chemical reaction prediction. Chemical Science, Vol. 13 (2022), 9023--9034. Issue 31. https://doi.org/10.1039/D2SC02763AGoogle ScholarCross Ref
- Jinhua Zhu, Yingce Xia, Lijun Wu, Di He, Tao Qin, Wengang Zhou, Houqiang Li, and Tie-Yan Liu. 2020. Incorporating BERT into Neural Machine Translation. In International Conference on Learning Representations. https://arxiv.org/abs/2002.06823Google Scholar
Index Terms
- Dual-view Molecular Pre-training
Recommendations
Unified 2D and 3D Pre-Training of Molecular Representations
KDD '22: Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data MiningMolecular representation learning has attracted much attention recently. A molecule can be viewed as a 2D graph with nodes/atoms connected by edges/bonds, and can also be represented by a 3D conformation with 3-dimensional coordinates of all atoms. We ...
Automated 3D Pre-Training for Molecular Property Prediction
KDD '23: Proceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data MiningMolecular property prediction is an important problem in drug discovery and materials science. As geometric structures have been demonstrated necessary for molecular property prediction,3D information has been combined with various graph learning ...
An evolutionary fragment-based approach to molecular fingerprint reconstruction
GECCO '22: Proceedings of the Genetic and Evolutionary Computation ConferenceFor in silico drug discovery various representations have been established regarding storing and processing molecular data. The choice of representation has a great impact on employed methods and algorithms. Molecular fingerprints in the form of fixed-...
Comments