research-article

MGMAE: Molecular Representation Learning by Reconstructing Heterogeneous Graphs with A High Mask Ratio

Authors:

Hongteng XuAuthors Info & Claims

CIKM '22: Proceedings of the 31st ACM International Conference on Information & Knowledge Management

Pages 509 - 519

https://doi.org/10.1145/3511808.3557395

Published: 17 October 2022 Publication History

Get Access

Abstract

Masked autoencoder (MAE), as an effective self-supervised learner for computer vision and natural language processing, has been recently applied to molecule representation learning. In this paper, we identify two issues in applying MAE to pre-train Transformer-based models on molecular graphs that existing works have ignored. (1) As only atoms are abstracted as tokens and then reconstructed, the chemical bonds are not decided in the decoded molecule, making molecules with different arrangements of the same atoms indistinguishable. (2) Although a high mask ratio that corresponds to a challenging reconstruction task has been proved beneficial in the vision domain, it cannot be trivially leveraged on molecular graphs as there is less redundancy of information in graph data. To resolve these issues, we propose a novel framework, Molecular Graph Mask AutoEncoder (MGMAE). As the first step in MGMAE, we transform each molecular graph into a heterogeneous atom-bond graph to fully use the bond attributes and design unidirectional position encoding for such graphs. Then we propose a hybrid masking mechanism that exploits the complementary nature between atoms' attributive and spatial features. Meanwhile, we compensate for the mask embedding by a dynamic aggregation representation that exploits the correlations between topologically adjacent tokens. As a result, MGMAE can reconstruct the masked atoms, the masked bonds, and the relative distance among atoms simultaneously, with a high mask ratio. We compare MGMAE with the state-of-the-art methods on various molecular benchmarks and show the competitiveness of MGMAE in both regression and classification tasks.

Supplementary Material

MP4 File (CIKM22-fp0535.mp4)

Molecular Graph Mask AutoEncoder (MGMAE) is a novel framework for molecular property prediction tasks. MGMAE consists of two main parts. First we transform each molecular graph into a heterogeneous atom-bond graph to fully use the bond attributes and design unidirectional position encoding for such graphs. Then we propose three techniques to apply high mask ratio on molecular graphs: asymmetric mask-predict mechanism, hybrid masking mechanism and mask embedding of dynamic aggregation. As a result, MGMAE can reconstruct the masked atoms, the masked bonds, and the relative distance among atoms simultaneously, with a high mask ratio.

Download
87.46 MB

References

[1]

Dominique Beaini, Saro Passaro, Vincent Létourneau, Will Hamilton, Gabriele Corso, and Pietro Liò. 2021. Directional graph networks. In International Con- ference on Machine Learning (ICML). PMLR, 748--758.

Abstract

Supplementary Material

References

Cited By

Index Terms

Recommendations

Unified 2D and 3D Pre-Training of Molecular Representations

KPGT: Knowledge-Guided Pre-training of Graph Transformer for Molecular Property Prediction

Molecular Graph Representation Learning via Structural Similarity Information

Comments

Information

Published In

Sponsors

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Funding Sources

Conference

Acceptance Rates

Upcoming Conference

Contributors

Other Metrics

Bibliometrics

Article Metrics

Other Metrics

Citations

Cited By

Login options

Full Access

View options

PDF

eReader

Share

Share this Publication link

Share on social media

Affiliations