skip to main content
10.1145/3511808.3557395acmconferencesArticle/Chapter ViewAbstractPublication PagescikmConference Proceedingsconference-collections
research-article

MGMAE: Molecular Representation Learning by Reconstructing Heterogeneous Graphs with A High Mask Ratio

Published: 17 October 2022 Publication History

Abstract

Masked autoencoder (MAE), as an effective self-supervised learner for computer vision and natural language processing, has been recently applied to molecule representation learning. In this paper, we identify two issues in applying MAE to pre-train Transformer-based models on molecular graphs that existing works have ignored. (1) As only atoms are abstracted as tokens and then reconstructed, the chemical bonds are not decided in the decoded molecule, making molecules with different arrangements of the same atoms indistinguishable. (2) Although a high mask ratio that corresponds to a challenging reconstruction task has been proved beneficial in the vision domain, it cannot be trivially leveraged on molecular graphs as there is less redundancy of information in graph data. To resolve these issues, we propose a novel framework, Molecular Graph Mask AutoEncoder (MGMAE). As the first step in MGMAE, we transform each molecular graph into a heterogeneous atom-bond graph to fully use the bond attributes and design unidirectional position encoding for such graphs. Then we propose a hybrid masking mechanism that exploits the complementary nature between atoms' attributive and spatial features. Meanwhile, we compensate for the mask embedding by a dynamic aggregation representation that exploits the correlations between topologically adjacent tokens. As a result, MGMAE can reconstruct the masked atoms, the masked bonds, and the relative distance among atoms simultaneously, with a high mask ratio. We compare MGMAE with the state-of-the-art methods on various molecular benchmarks and show the competitiveness of MGMAE in both regression and classification tasks.

Supplementary Material

MP4 File (CIKM22-fp0535.mp4)
Molecular Graph Mask AutoEncoder (MGMAE) is a novel framework for molecular property prediction tasks. MGMAE consists of two main parts. First we transform each molecular graph into a heterogeneous atom-bond graph to fully use the bond attributes and design unidirectional position encoding for such graphs. Then we propose three techniques to apply high mask ratio on molecular graphs: asymmetric mask-predict mechanism, hybrid masking mechanism and mask embedding of dynamic aggregation. As a result, MGMAE can reconstruct the masked atoms, the masked bonds, and the relative distance among atoms simultaneously, with a high mask ratio.

References

[1]
Dominique Beaini, Saro Passaro, Vincent Létourneau, Will Hamilton, Gabriele Corso, and Pietro Liò. 2021. Directional graph networks. In International Con- ference on Machine Learning (ICML). PMLR, 748--758.
[2]
Guy W Bemis and Mark A Murcko. 1996. The properties of known drugs. 1. molecular frameworks. Journal of medicinal chemistry (J. Med. Chem.), 39, 15, 2887--2893.
[3]
Maxwell L Bileschi, David Belanger, Drew H Bryant, Theo Sanderson, Brandon Carter, D Sculley, Alex Bateman, Mark A DePristo, and Lucy J Colwell. 2022. Using deep learning to annotate the protein universe. Nature Biotechnology, 1--6.
[4]
Rémy Brossard, Oriel Frigo, and David Dehaene. 2020. Graph convolutions that can finally model local structure. arXiv preprint arXiv:2011.15069.
[5]
Hongming Chen, Ola Engkvist, Yinhai Wang, Marcus Olivecrona, and Thomas Blaschke. 2018. The rise of deep learning in drug discovery. Drug discovery today, 23, 6, 1241--1250.
[6]
Gabriele Corso, Luca Cavalleri, Dominique Beaini, Pietro Liò, and Petar Veličković. 2020. Principal neighbourhood aggregation for graph nets. Advances in neural information processing systems (NeurIPS), 33, 13260--13271.
[7]
Michaël Defferrard, Xavier Bresson, and Pierre Vandergheynst. 2016. Convolu- tional neural networks on graphs with fast localized spectral filtering. Advances in neural information processing systems (NeurIPS), 29.
[8]
Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2018. Bert: pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805.
[9]
Vijay Prakash Dwivedi and Xavier Bresson. 2020. A generalization of transformer networks to graphs. arXiv preprint arXiv:2012.09699.
[10]
Felix A Faber et al. 2017. Prediction errors of molecular machine learning models lower than hybrid dft error. Journal of chemical theory and computation, 13, 11, 5255--5264.
[11]
Xiaomin Fang, Lihang Liu, Jieqiong Lei, Donglong He, Shanzhuo Zhang, Jingbo Zhou, Fan Wang, Hua Wu, and Haifeng Wang. 2021. Chemrl-gem: geometry enhanced molecular representation learning for property prediction. arXiv preprint arXiv:2106.06130.
[12]
Justin Gilmer, Samuel S Schoenholz, Patrick F Riley, Oriol Vinyals, and George E Dahl. 2017. Neural message passing for quantum chemistry. In International Conference on Machine Learning (ICML). PMLR, 1263--1272.
[13]
Aditya Grover and Jure Leskovec. 2016. Node2vec: scalable feature learning for networks. In Proceedings of the 22nd ACM SIGKDD international conference on Knowledge discovery and data mining (KDD), 855--864.
[14]
Hakim Hafidi, Mounir Ghogho, Philippe Ciblat, and Ananthram Swami. 2020. Graphcl: contrastive self-supervised learning of graph representations. arXiv preprint arXiv:2007.08025.
[15]
Will Hamilton, Zhitao Ying, and Jure Leskovec. 2017. Inductive representation learning on large graphs. Advances in neural information processing systems (NeurIPS), 30.
[16]
Kaiming He, Xinlei Chen, Saining Xie, Yanghao Li, Piotr Dollár, and Ross Girshick. 2021. Masked autoencoders are scalable vision learners. arXiv preprint arXiv:2111.06377.
[17]
Weihua Hu, Matthias Fey, Hongyu Ren, Maho Nakata, Yuxiao Dong, and Jure Leskovec. 2021. OGB-LSC: A large-scale challenge for machine learning on graphs. CoRR, abs/2103.09430. https://arxiv.org/abs/2103.09430 arXiv: 2103.09430.
[18]
Weihua Hu, Bowen Liu, Joseph Gomes, Marinka Zitnik, Percy Liang, Vijay Pande, and Jure Leskovec. 2019. Strategies for pre-training graph neural networks. arXiv preprint arXiv:1905.12265.
[19]
Ziniu Hu, Yuxiao Dong, Kuansan Wang, Kai-Wei Chang, and Yizhou Sun. 2020. Gpt-gnn: generative pre-training of graph neural networks. In Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD), 1857--1867.
[20]
Ziniu Hu, Yuxiao Dong, Kuansan Wang, and Yizhou Sun. 2020. Heterogeneous graph transformer. In Proceedings of The Web Conference 2020 (WWW), 2704--2710.
[21]
Md Shamim Hussain, Mohammed J Zaki, and Dharmashankar Subramanian. 2021. Edge-augmented graph transformers: global self-attention is enough for graphs. arXiv preprint arXiv:2108.03348.
[22]
Martin Jaggi. 2013. Revisiting frank-wolfe: projection-free sparse convex op- timization. In International Conference on Machine Learning (ICML). PMLR, 427--435.
[23]
John Jumper et al. 2021. Highly accurate protein structure prediction with alphafold. Nature, 596, 7873, 583--589.
[24]
Steven Kearnes, Kevin McCloskey, Marc Berndl, Vijay Pande, and Patrick Riley. 2016. Molecular graph convolutions: moving beyond fingerprints. Journal of computer-aided molecular design (J. Comput. Aided Mol. Des.), 30, 8, 595--608.
[25]
Thomas N Kipf and Max Welling. 2016. Semi-supervised classification with graph convolutional networks. arXiv preprint arXiv:1609.02907.
[26]
Johannes Klicpera, Janek Groß, and Stephan Günnemann. 2020. Directional message passing for molecular graphs. arXiv preprint arXiv:2003.03123.
[27]
Walter Kohn and Lu Jeu Sham. 1965. Self-consistent equations including exchange and correlation effects. Physical review, 140, 4A, A1133.
[28]
Tuan Le, Marco Bertolini, Frank Noé, and Djork-Arné Clevert. 2021. Parameterized hypercomplex graph neural networks for graph classification. In International Conference on Artificial Neural Networks (ICANN). Springer, 204--216.
[29]
Guohao Li, Chenxin Xiong, Ali Thabet, and Bernard Ghanem. 2020. Deepergcn: all you need to train deeper gcns. arXiv preprint arXiv:2006.07739.
[30]
Pengyong Li, Jun Wang, Yixuan Qiao, Hao Chen, Yihuan Yu, Xiaojun Yao, Peng Gao, Guotong Xie, and Sen Song. 2020. Learn molecular representations from large-scale unlabeled molecules for drug discovery. arXiv preprint arXiv:2012.11175.
[31]
Shengchao Liu, Mehmet F Demirel, and Yingyu Liang. 2019. N-gram graph: simple unsupervised representation for graphs, with applications to molecules. Advances in neural information processing systems (NeurIPS), 32.
[32]
Yixin Liu, Shirui Pan, Ming Jin, Chuan Zhou, Feng Xia, and Philip S Yu. 2021. Graph self-supervised learning: a survey. arXiv preprint arXiv:2103.00111.
[33]
Chengqiang Lu, Qi Liu, Chao Wang, Zhenya Huang, Peize Lin, and Lixin He. 2019. Molecular property prediction: a multilevel quantum interactions modeling perspective. In Proceedings of the AAAI Conference on Artificial Intelligence (AAAI) number 01. Vol. 33, 1052--1060.
[34]
Sharan Narang et al. 2021. Do transformer modifications transfer across imple- mentations and applications? In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing (EMNLP), 5758--5773.
[35]
Wonpyo Park, Woong-Gi Chang, Donggeon Lee, Juntae Kim, et al. 2022. Grpe: relative positional encoding for graph transformer. In International Conference on Learning Representations 2022 Machine Learning for Drug Discovery (ICLR).
[36]
Zhen Peng, Wenbing Huang, Minnan Luo, Qinghua Zheng, Yu Rong, Tingyang Xu, and Junzhou Huang. 2020. Graph representation learning via graphical mutual information maximization. In Proceedings of The Web Conference 2020 (WWW), 259--270.
[37]
Bryan Perozzi, Rami Al-Rfou, and Steven Skiena. 2014. Deepwalk: online learning of social representations. In Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining (KDD), 701--710.
[38]
Alec Radford, Jeffrey Wu, Rewon Child, David Luan, Dario Amodei, Ilya Sutskever, et al. 2019. Language models are unsupervised multitask learners. OpenAI blog, 1, 8, 9.
[39]
Yu Rong, Yatao Bian, Tingyang Xu, Weiyang Xie, Ying Wei, Wenbing Huang, and Junzhou Huang. 2020. Self-supervised graph transformer on large-scale molecular data. Advances in neural information processing systems (NeurIPS), 33, 12559--12571.
[40]
Kristof Schütt, Pieter-Jan Kindermans, Huziel Enoc Sauceda Felix, Stefan Chmiela, Alexandre Tkatchenko, and Klaus-Robert Müller. 2017. Schnet: a continuous-filter convolutional neural network for modeling quantum interactions. Advances in neural information processing systems (NeurIPS), 30.
[41]
Arjun Subramonian. 2021. Motif-driven contrastive learning of graph represen- tations. In Proceedings of the AAAI Conference on Artificial Intelligence (AAAI) number 18. Vol. 35, 15980--15981.
[42]
Fan-Yun Sun, Jordan Hoffmann, Vikas Verma, and Jian Tang. 2019. Infograph: unsupervised and semi-supervised graph-level representation learning via mutual information maximization. arXiv preprint arXiv:1908.01000.
[43]
Fei Sun, Jun Liu, Jian Wu, Changhua Pei, Xiao Lin, Wenwu Ou, and Peng Jiang. 2019. Bert4rec: sequential recommendation with bidirectional encoder representations from transformer. In Proceedings of the 28th ACM international conference on information and knowledge management (CIKM), 1441--1450.
[44]
Jian Tang, Meng Qu, Mingzhe Wang, Ming Zhang, Jun Yan, and Qiaozhu Mei. 2015. Line: large-scale information network embedding. In Proceedings of the 24th international conference on world wide web (WWW), 1067--1077.
[45]
Petar Veličković, Guillem Cucurull, Arantxa Casanova, Adriana Romero, Pietro Lio, and Yoshua Bengio. 2017. Graph attention networks. stat, 1050, 20.
[46]
Petar Veličković, William Fedus, William L Hamilton, Pietro Liò, Yoshua Bengio, and R Devon Hjelm. 2019. Deep graph infomax. International Conference on Learning Representations (ICLR), 2, 3, 4.
[47]
Junmei Wang and Tingjun Hou. 2011. Application of molecular dynamics simulations in molecular property prediction II: diffusion coefficient. J. Comput. Chem., 32, 16, 3505--3519.
[48]
Yuyang Wang, Jianren Wang, Zhonglin Cao, and Amir Barati Farimani. 2021. Molclr: molecular contrastive learning of representations via graph neural networks. arXiv preprint arXiv:2102.10056.
[49]
Zhenqin Wu, Bharath Ramsundar, Evan N Feinberg, Joseph Gomes, Caleb Geniesse, Aneesh S Pappu, Karl Leswing, and Vijay Pande. 2018. Moleculenet: a benchmark for molecular machine learning. Chemical science, 9, 2, 513--530.
[50]
Jun Xia, Lirong Wu, Jintao Chen, Bozhen Hu, and Stan Z Li. 2022. Simgrace: a simple framework for graph contrastive learning without data augmentation. In Proceedings of the ACM Web Conference 2022 (WWW), 1070--1079.
[51]
Zhaoping Xiong et al. 2019. Pushing the boundaries of molecular representation for drug discovery with the graph attention mechanism. Journal of medicinal chemistry (J. Med. Chem.), 63, 16, 8749--8760.
[52]
Keyulu Xu, Weihua Hu, Jure Leskovec, and Stefanie Jegelka. 2018. How powerful are graph neural networks? arXiv preprint arXiv:1810.00826.
[53]
Ling Xue and Jurgen Bajorath. 2000. Molecular descriptors in chemoinformatics, computational combinatorial chemistry, and virtual screening. Combinatorial chemistry & high throughput screening, 3, 5, 363--372.
[54]
Chaochao Yan, Qianggang Ding, Peilin Zhao, Shuangjia Zheng, Jinyu Yang, Yang Yu, and Junzhou Huang. 2020. Retroxpert: decompose retrosynthesis prediction like a chemist. Advances in neural information processing systems (NeurIPS), 33, 11248--11258.
[55]
Kevin Yang et al. 2019. Analyzing learned molecular representations for property prediction. Journal of chemical information and modeling (J. Chem. Inf. Model.), 59, 8, 3370--3388.
[56]
Chengxuan Ying, Tianle Cai, Shengjie Luo, Shuxin Zheng, Guolin Ke, Di He, Yanming Shen, and Tie-Yan Liu. 2021. Do transformers really perform badly for graph representation? Advances in neural information processing systems (NeurIPS), 34.
[57]
Seongjun Yun, Minbyul Jeong, Raehyun Kim, Jaewoo Kang, and Hyunwoo J Kim. 2019. Graph transformer networks. Advances in neural information processing systems (NeurIPS), 32.
[58]
Zaixi Zhang, Qi Liu, Hao Wang, Chengqiang Lu, and Chee-Kong Lee. 2021. Motif-based graph self-supervised learning for molecular property prediction. Advances in neural information processing systems (NeurIPS), 34.

Cited By

View all
  • (2024)Impact of Domain Knowledge and Multi-Modality on Intelligent Molecular Property Prediction: A Systematic SurveyBig Data Mining and Analytics10.26599/BDMA.2024.90200287:3(858-888)Online publication date: Sep-2024
  • (2024)MMPolymer: A Multimodal Multitask Pretraining Framework for Polymer Property PredictionProceedings of the 33rd ACM International Conference on Information and Knowledge Management10.1145/3627673.3679684(2336-2346)Online publication date: 21-Oct-2024
  • (2024)RARE: Robust Masked Graph AutoencoderIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2023.333522236:10(5340-5353)Online publication date: Oct-2024

Index Terms

  1. MGMAE: Molecular Representation Learning by Reconstructing Heterogeneous Graphs with A High Mask Ratio

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    CIKM '22: Proceedings of the 31st ACM International Conference on Information & Knowledge Management
    October 2022
    5274 pages
    ISBN:9781450392365
    DOI:10.1145/3511808
    • General Chairs:
    • Mohammad Al Hasan,
    • Li Xiong
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 17 October 2022

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. graph representation learning
    2. molecular property prediction
    3. self-supervised learning
    4. transformer

    Qualifiers

    • Research-article

    Funding Sources

    Conference

    CIKM '22
    Sponsor:

    Acceptance Rates

    CIKM '22 Paper Acceptance Rate 621 of 2,257 submissions, 28%;
    Overall Acceptance Rate 1,861 of 8,427 submissions, 22%

    Upcoming Conference

    CIKM '25

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)87
    • Downloads (Last 6 weeks)10
    Reflects downloads up to 14 Feb 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)Impact of Domain Knowledge and Multi-Modality on Intelligent Molecular Property Prediction: A Systematic SurveyBig Data Mining and Analytics10.26599/BDMA.2024.90200287:3(858-888)Online publication date: Sep-2024
    • (2024)MMPolymer: A Multimodal Multitask Pretraining Framework for Polymer Property PredictionProceedings of the 33rd ACM International Conference on Information and Knowledge Management10.1145/3627673.3679684(2336-2346)Online publication date: 21-Oct-2024
    • (2024)RARE: Robust Masked Graph AutoencoderIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2023.333522236:10(5340-5353)Online publication date: Oct-2024

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media