Abstract
Can we inject the pocket-ligand complementarity knowledge into the pre-trained model and jointly learn their chemical space? Pre-training molecules and proteins have attracted considerable attention in recent years, while most of these approaches focus on learning one of the chemical spaces and lack the consideration of their complementarity. We propose a co-supervised pre-training (CoSP) framework to learn 3D pocket and ligand representations simultaneously. We use a gated geometric message passing layer to model 3D pockets and ligands, where each node’s chemical features, geometric position, and direction are considered. To learn meaningful biological embeddings, we inject the pocket-ligand complementarity into the pre-training model via ChemInfoNCE loss, cooperating with a chemical similarity-enhanced negative sampling strategy to improve the representation learning. Through extensive experiments, we conclude that CoSP can achieve competitive results in pocket matching, molecule property prediction, and virtual screening.
Z. Gao and C. Tan—Equal Contribution.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Altalib, M.K., Salim, N.: Similarity-based virtual screen using enhanced siamese deep learning methods. ACS omega 7(6), 4769–4786 (2022)
Ballester, P.J., Mitchell, J.B.: A machine learning approach to predicting protein-ligand binding affinity with applications to molecular docking. Bioinformatics 26(9), 1169–1175 (2010)
Batista, J., Hawkins, P.C., Tolbert, R., Geballe, M.T.: Sitehopper-a unique tool for binding site comparison. J. Cheminform. 6(1), 1–1 (2014)
Batzner, S., et al.: E (3)-equivariant graph neural networks for data-efficient and accurate interatomic potentials. Nature Commun. 13(1), 1–11 (2022)
Boström, J., Hogner, A., Schmitt, S.: Do structurally similar ligands bind in a similar fashion? J. Med. Chem. 49(23), 6716–6725 (2006)
Brandstetter, J., Hesselink, R., van der Pol, E., Bekkers, E., Welling, M.: Geometric and physical quantities improve e (3) equivariant message passing. arXiv preprint arXiv:2110.02905 (2021)
Chaffey, N.: Alberts, B., Johnson, A., Lewis, J., Raff, M., Roberts, K., Walter, P. Molecular Biology of the Cell. 4th edn. (2003)
Chartier, M., Najmanovich, R.: Detection of binding site molecular interaction field similarities. J. Chem. Inform. Model. 55(8), 1600–1615 (2015)
Chuang, C.Y., Robinson, J., Lin, Y.C., Torralba, A., Jegelka, S.: Debiased contrastive learning. Adv. Neural Inform. Process. Syst. 33, 8765–8775 (2020)
Cohen, T., Welling, M.: Group equivariant convolutional networks. In: International conference on machine learning, pp. 2990–2999. PMLR (2016)
Dankwah, K.O., Mohl, J.E., Begum, K., Leung, M.Y.: Understanding the binding of the same ligand to gpcrs of different families. In: 2021 IEEE International Conference on Bioinformatics and Biomedicine (BIBM), pp. 2494–2501 (2021). https://doi.org/10.1109/BIBM52615.2021.9669761
Desaphy, J., Azdimousa, K., Kellenberger, E., Rognan, D.: Comparison and druggability prediction of protein-ligand binding sites from pharmacophore-annotated cavity shapes (2012)
Desaphy, J., Raimbaud, E., Ducrot, P., Rognan, D.: Encoding protein-ligand interaction patterns in fingerprints and graphs. J. Chem. Inform. Model. 53(3), 623–637 (2013)
Durrant, J.D., McCammon, J.A.: Nnscore: a neural-network-based scoring function for the characterization of protein- ligand complexes. J. Chem. Inform. Model. 50(10), 1865–1871 (2010)
Ehrt, C., Brinkjost, T., Koch, O.: A benchmark driven guide to binding site comparison: an exhaustive evaluation using tailor-made data sets (prospeccts). PLoS Comput. Biol. 14(11), e1006483 (2018)
Fang, X.: Geometry-enhanced molecular representation learning for property prediction. Nature Mach. Intell. 4(2), 127–134 (2022)
Fang, Y., Yang, H., Zhuang, X., Shao, X., Fan, X., Chen, H.: Knowledge-aware contrastive molecular graph learning. arXiv preprint arXiv:2103.13047 (2021)
Francoeur, P.G.: Three-dimensional convolutional neural networks and a cross-docked data set for structure-based drug design. J. Chem. Inform. Model. 60(9), 4200–4215 (2020)
Fuchs, F., Worrall, D., Fischer, V., Welling, M.: Se (3)-transformers: 3d roto-translation equivariant attention networks. Adv. Neural Inform. Process. Syst. 33, 1970–1981 (2020)
Ganea, O.E., et al.: Independent se (3)-equivariant models for end-to-end rigid protein docking. arXiv preprint arXiv:2111.07786 (2021)
Gao, Z., Tan, C., Li, S., et al.: Alphadesign: A graph protein design method and benchmark on alphafolddb. arXiv preprint arXiv:2202.01079 (2022)
Guan, J., Qian, W.W., Ma, W.Y., Ma, J., Peng, J., et al.: Energy-inspired molecular conformation optimization. In: International Conference on Learning Representations (2021)
Hu, W., et al.: Strategies for pre-training graph neural networks. arXiv preprint arXiv:1905.12265 (2019)
Jing, B., Eismann, S., Suriana, P., Townshend, R.J., Dror, R.: Learning from protein structure with geometric vector perceptrons. arXiv preprint arXiv:2009.01411 (2020)
Kinnings, S.L., Liu, N., Buchmeier, N., Tonge, P.J., Xie, L., Bourne, P.E.: Drug discovery using chemical systems biology: repositioning the safe medicine comtan to treat multi-drug and extensively drug resistant tuberculosis. PLoS Comput. Biol. 5(7), e1000423 (2009)
Konc, J., Janežič, D.: Probis algorithm for detection of structurally similar protein binding sites by local structural alignment. Bioinformatics 26(9), 1160–1168 (2010)
Krotzky, T., Grunwald, C., Egerland, U., Klebe, G.: Large-scale mining for similar protein binding pockets: with RAPMAD retrieval on the fly becomes real. J. Chem. Inform. Model. 55(1), 165–179 (2015)
Landrum, G.: Rdkit: Open-source cheminformatics software (2016). https://github.com/rdkit/rdkit/releases/tag/Release_2016_09_4
Li, P., et al.: An effective self-supervised framework for learning expressive molecular global representations to drug discovery. Brief. Bioinform. 22(6), bbab109 (2021)
Liu, S., Demirel, M.F., Liang, Y.: N-gram graph: simple unsupervised representation for graphs, with applications to molecules. Adv. Neural Inform. Process. Syst. 32 (2019)
Liu, S., Wang, H., Liu, W., Lasenby, J., Guo, H., Tang, J.: Pre-training molecular graph representation with 3D geometry-rethinking self-supervised learning on structured data
Liu, S., Wang, H., Liu, W., Lasenby, J., Guo, H., Tang, J.: Pre-training molecular graph representation with 3D geometry. arXiv preprint arXiv:2110.07728 (2021)
Lu, A.X., Zhang, H., Ghassemi, M., Moses, A.: Self-supervised contrastive learning of protein representations by mutual information maximization. BioRxiv (2020)
Mysinger, M.M., Carchia, M., Irwin, J.J., Shoichet, B.K.: Directory of useful decoys, enhanced (dud-e): better ligands and decoys for better benchmarking. J. Med. Chem. 55(14), 6582–6594 (2012)
Nguyen, T., Le, H., Quinn, T.P., Nguyen, T., Le, T.D., Venkatesh, S.: Graphdta: predicting drug-target binding affinity with graph neural networks. Bioinformatics 37(8), 1140–1147 (2021)
Oord, A.v.d., Li, Y., Vinyals, O.: Representation learning with contrastive predictive coding. arXiv preprint arXiv:1807.03748 (2018)
Pu, L., Govindaraj, R.G., Lemoine, J.M., Wu, H.C., Brylinski, M.: Deepdrug3d: classification of ligand-binding pockets in proteins with a convolutional neural network. PLoS Comput. Biol. 15(2), e1006718 (2019)
Ragoza, M., Hochuli, J., Idrobo, E., Sunseri, J., Koes, D.R.: Protein-ligand scoring with convolutional neural networks. J. Chem. Inform. Model. 57(4), 942–957 (2017)
Robinson, J., Chuang, C.Y., Sra, S., Jegelka, S.: Contrastive learning with hard negative samples. arXiv preprint arXiv:2010.04592 (2020)
Rong, Y.: Self-supervised graph transformer on large-scale molecular data. Adv. Neural Inform. Process. Syst. 33, 12559–12571 (2020)
Satorras, V.G., Hoogeboom, E., Welling, M.: E (n) equivariant graph neural networks. In: International Conference on Machine Learning, pp. 9323–9332. PMLR (2021)
Schalon, C., Surgand, J.S., Kellenberger, E., Rognan, D.: A simple and fuzzy method to align and compare druggable ligand-binding sites. Proteins: Struct., Funct., Bioinform. 71(4), 1755–1778 (2008)
Schmitt, S., Kuhn, D., Klebe, G.: A new method to detect related function among proteins independent of sequence and fold homology. J. Mol. Biol. 323(2), 387–406 (2002)
Shrivastava, A.D., Kell, D.B.: Fragnet, a contrastive learning-based transformer model for clustering, interpreting, visualizing, and navigating chemical space. Molecules 26(7), 2065 (2021)
Shulman-Peleg, A., Nussinov, R., Wolfson, H.J.: Siteengines: recognition and comparison of binding sites and protein-protein interfaces. Nucleic Acids Res. 33(suppl_2), W337–W341 (2005)
Simonovsky, M., Meyers, J., Meyers, J.: Deeplytough: learning structural comparison of protein binding sites. J. Chem. Inform. Model. 60(4), 2356–2366 (2020)
Stärk, H., et al.: 3d infomax improves GNNs for molecular property prediction. arXiv preprint arXiv:2110.04126 (2021)
Sturmfels, P., Vig, J., Madani, A., Rajani, N.F.: Profile prediction: An alignment-based pre-training task for protein sequence models. arXiv preprint arXiv:2012.00195 (2020)
Sun, M., Xing, J., Wang, H., Chen, B., Zhou, J.: Mocl: data-driven molecular fingerprint via knowledge-aware contrastive learning from molecular graph. In: Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining, pp. 3585–3594 (2021)
Thomas, N., Smidt, T., Kearnes, S., Yang, L., Li, L., Kohlhoff, K., Riley, P.: Tensor field networks: Rotation-and translation-equivariant neural networks for 3D point clouds. arXiv preprint arXiv:1802.08219 (2018)
Torng, W., Altman, R.B.: Graph convolutional neural networks for predicting drug-target interactions. J. Chem. Inform. Model. 59(10), 4131–4149 (2019)
Tosco, P., Stiefl, N., Landrum, G.: Bringing the MMFF force field to the RDKit: implementation and validation. J. Cheminform. 6(1), 1–4 (2014)
Vina, A.: Improving the speed and accuracy of docking with a new scoring function, efficient optimization, and multithreading trott, oleg; olson, arthur j. J. Comput. Chem 31(2), 455–461 (2010)
Wang, R., Fang, X., Lu, Y., Yang, C.Y., Wang, S.: The pdbbind database: methodologies and updates. J. Med. Chem. 48(12), 4111–4119 (2005)
Wang, S., Shan, P., Zhao, Y., Zuo, L.: Gandti: a multi-task neural network for drug-target interaction prediction. Comput. Biol. Chem. 92, 107476 (2021)
Wang, Y., Wang, J., Cao, Z., Barati Farimani, A.: Molecular contrastive learning of representations via graph neural networks. Nature Mach. Intell. 4(3), 279–287 (2022)
Weber, A., et al.: Unexpected nanomolar inhibition of carbonic anhydrase by cox-2-selective celecoxib: new pharmacological opportunities due to related binding site recognition. J. Med. Chem. 47(3), 550–557 (2004)
Weill, N., Rognan, D.: Alignment-free ultra-high-throughput comparison of druggable protein- ligand binding sites. J. Chem. Inform. Model. 50(1), 123–135 (2010)
Willmann, D., et al.: Impairment of prostate cancer cell growth by a selective and reversible lysine-specific demethylase 1 inhibitor. Int. J. Cancer 131(11), 2704–2709 (2012)
Wood, D.J., Vlieg, J.d., Wagener, M., Ritschel, T.: Pharmacophore fingerprint-based approach to binding site subpocket similarity and its application to bioisostere replacement. J. Chem. Inform. Model. 52(8), 2031–2043 (2012)
Wu, Z., et al.: Moleculenet: a benchmark for molecular machine learning. Chem. Sci. 9(2), 513–530 (2018)
Xie, L., Bourne, P.E.: Detecting evolutionary relationships across existing fold space, using sequence order-independent profile-profile alignments. Proc. National Acad. Sci. 105(14), 5441–5446 (2008)
Xiong, Z., et al.: Pushing the boundaries of molecular representation for drug discovery with the graph attention mechanism. J. Med. Chem. 63(16), 8749–8760 (2019)
Yang, J., Roy, A., Zhang, Y.: Biolip: a semi-manually curated database for biologically relevant ligand-protein interactions. Nucleic Acids Res. 41(D1), D1096–D1103 (2012)
Yang, K., et al.: Analyzing learned molecular representations for property prediction. J. Chem. Inform. Model. 59(8), 3370–3388 (2019)
Yang, K.K., Lu, A.X., Fusi, N.: Convolutions are competitive with transformers for protein sequence pretraining. In: ICLR2022 Machine Learning for Drug Discovery (2022)
Yang, Y., et al.: Computational discovery and experimental verification of tyrosine kinase inhibitor pazopanib for the reversal of memory and cognitive deficits in rat model neurodegeneration. Chem. Sci. 6(5), 2812–2821 (2015)
Yazdani-Jahromi, M., et al.: Attentionsitedti: an interpretable graph-based model for drug-target interaction prediction using NLP sentence-level relation classification. Brief. Bioinform. 23(4), bbac272 (2022)
Yeturu, K., Chandra, N.: Pocketmatch: a new algorithm to compare binding sites in protein structures. BMC Bioinform. 9(1), 1–17 (2008)
Zhang, N., et al.: Ontoprotein: Protein pretraining with gene ontology embedding. arXiv preprint arXiv:2201.11147 (2022)
Zhang, S., Hu, Z., Subramonian, A., Sun, Y.: Motif-driven contrastive learning of graph representations. arXiv preprint arXiv:2012.12533 (2020)
Zhang, Y., Skolnick, J.: Tm-align: a protein structure alignment algorithm based on the TM-score. Nucleic Acids Res. 33(7), 2302–2309 (2005)
Zhang, Z., Liu, Q., Wang, H., Lu, C., Lee, C.K.: Motif-based graph self-supervised learning for molecular property prediction. Adv. Neural Inform. Process. Syst. 34, 15870–15882 (2021)
Zhang, Z., et al.: Protein representation learning by geometric structure pretraining. arXiv preprint arXiv:2203.06125 (2022)
Zheng, S., Li, Y., Chen, S., Xu, J., Yang, Y.: Predicting drug-protein interaction using quasi-visual question answering system. Nature Mach. Intell. 2(2), 134–140 (2020)
Zhou, G., et al.: Uni-mol: A universal 3d molecular representation learning framework (2022)
Acknowledgements
We thank the open-sourced codes of previous studies. This work was supported by the National Key R &D Program of China (Project 2022ZD0115100), the National Natural Science Foundation of China (Project U21A20427), the Research Center for Industries of the Future (Project WU2022C043).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Ethics declarations
Ethical Statement
Our submission does not involve any ethical issues, including but not limited to privacy, security, etc.
Rights and permissions
Copyright information
© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Gao, Z., Tan, C., Xia, J., Li, S.Z. (2023). Co-supervised Pre-training of Pocket and Ligand. In: Koutra, D., Plant, C., Gomez Rodriguez, M., Baralis, E., Bonchi, F. (eds) Machine Learning and Knowledge Discovery in Databases: Research Track. ECML PKDD 2023. Lecture Notes in Computer Science(), vol 14169. Springer, Cham. https://doi.org/10.1007/978-3-031-43412-9_24
Download citation
DOI: https://doi.org/10.1007/978-3-031-43412-9_24
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-43411-2
Online ISBN: 978-3-031-43412-9
eBook Packages: Computer ScienceComputer Science (R0)