research-article

Advancing Molecule Invariant Representation via Privileged Substructure Identification

Authors:

Ruijia Wang,

Haoran Dai,

Cheng Yang,

Le Song,

Chuan ShiAuthors Info & Claims

KDD '24: Proceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining

Pages 3188 - 3199

https://doi.org/10.1145/3637528.3671886

Published: 24 August 2024 Publication History

Get Access

Abstract

Graph neural networks (GNNs) have revolutionized molecule representation learning by modeling molecules as graphs, with atoms represented as nodes and chemical bonds as edges. Despite their progress, they struggle with out-of-distribution scenarios, such as changes in size or scaffold of molecules with identical properties. Some studies attempt to mitigate this issue through graph invariant learning, which penalizes prediction variance across environments to learn invariant representations. But in the realm of molecules, core functional groups forming privileged substructures dominate molecular properties and remain invariant across distribution shifts. This highlights the need for integrating this prior knowledge and ensuring the environment split compatible with molecule invariant learning. To bridge this gap, we propose a novel framework named MILI. Specifically, we first formalize molecule invariant learning based on privileged substructure identification and introduce substructure invariance constraint. Building on this foundation, we theoretically establish two criteria for environment splits conducive to molecule invariant learning. Inspired by these criteria, we develop a dual-head graph neural network. A shared identifier identifies privileged substructures, while environment and task heads generate predictions based on variant and privileged substructures. Through the interaction of two heads, the environments are split and optimized to meet our criteria. The unified MILI guarantees that molecule invariant learning and environment split achieve mutual enhancement from theoretical analysis and network design. Extensive experiments across eight benchmarks validate the effectiveness of MILI compared to state-of-the-art baselines.

References

[1]

Kartik Ahuja, Ethan Caballero, Dinghuai Zhang, Jean-Christophe Gagnon-Audet, Yoshua Bengio, Ioannis Mitliagkas, and Irina Rish. 2021. Invariance principle meets information bottleneck for out-of-distribution generalization. In NeurIPS, Vol. 34. 3438--3450.

Abstract

References

Index Terms

Recommendations

MoleRec: Combinatorial Drug Recommendation with Substructure-Aware Molecular Representation Learning

FSHR activation through small molecule modulators: Mechanistic insights from MD simulations

Docking assay of small molecule antivirals to p7 of HCV

Comments

Information

Published In

Sponsors

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Upcoming Conference

Contributors

Other Metrics

Bibliometrics

Article Metrics

Other Metrics

Citations

Login options

Full Access

View options

PDF

eReader

Share

Share this Publication link

Share on social media

Affiliations