Abstract
This work focuses on the task of property targeting: that is, generating molecules conditioned on target chemical properties to expedite candidate screening for novel drug and materials development. DiGress is a recent diffusion model for molecular graphs whose distinctive feature is allowing property targeting through classifier-based (CB) guidance. While CB guidance may work to generate molecular-like graphs, we hint at the fact that its assumptions apply poorly to the chemical domain. Based on this insight we propose a classifier-free DiGress (FreeGress), which works by directly injecting the conditioning information into the training process. CF guidance is convenient given its less stringent assumptions and since it does not require to train an auxiliary property regressor, thus halving the number of trainable parameters in the model. We empirically show that our model yields significant improvement in Mean Absolute Error with respect to DiGress on property targeting tasks on QM9 and ZINC-250k benchmarks. As an additional contribution, we propose a simple yet powerful approach to improve the chemical validity of generated samples, based on the observation that certain chemical properties such as molecular weight correlate with the number of atoms in molecules.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Aldeghi, M., Graff, D.E., Frey, N., et al.: Roughness of molecular property landscapes and its impact on modellability. J. Chem. Inf. Model. 62(19), 4660–4671 (2022). https://doi.org/10.1021/acs.jcim.2c00903
Arjovsky, M., Chintala, S., Bottou, L.: Wasserstein generative adversarial networks. In: Proceedings of the 34th International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 70, pp. 214–223. PMLR (2017)
Austin, J., Johnson, D.D., Ho, J., Tarlow, D., van den Berg, R.: Structured denoising diffusion models in discrete state-spaces. In: Advances in Neural Information Processing Systems, vol. 34, pp. 17981–17993. Curran Associates, Inc. (2021)
Bacciu, D., Podda, M.: GraphGen-redux: a fast and lightweight recurrent model for labeled graph generation. In: 2021 International Joint Conference on Neural Networks (IJCNN), pp. 1–8. IEEE (2021). https://doi.org/10.1109/IJCNN52387.2021.9533743
Corso, G., Cavalleri, L., Beaini, D., Liò, P., Veličković, P.: Principal neighbourhood aggregation for graph nets. In: Advances in Neural Information Processing Systems, vol. 33, pp. 13260–13271. Curran Associates, Inc. (2020)
Dara, S., Dhamercherla, S., Jadav, S.S., et al.: Machine learning in drug discovery: a review. Artif. Intell. Rev. 55(3), 1947–1999 (2021). https://doi.org/10.1007/s10462-021-10058-4
De Cao, N., Kipf, T.: MolGAN: an implicit generative model for small molecular graphs. In: ICML 2018 workshop on Theoretical Foundations and Applications of Deep Generative Models (2018)
Dhariwal, P., Nichol, A.: Diffusion models beat GANs on image synthesis. In: Advances in Neural Information Processing Systems, vol. 34, pp. 8780–8794. Curran Associates, Inc. (2021)
Dwivedi, V.P., Bresson, X.: A generalization of transformer networks to graphs. Methods and Applications, AAAI Workshop on Deep Learning on Graphs (2021)
Goodfellow, I., et al.: Generative adversarial nets. In: Advances in Neural Information Processing Systems, vol. 27. Curran Associates, Inc. (2014)
Goyal, N., Jain, H.V., Ranu, S.: GraphGen: a scalable approach to domain-agnostic labeled graph generation. In: Proceedings of The Web Conference 2020. pp. 1253–1263. Association for Computing Machinery, New York, NY, USA (2020). https://doi.org/10.1145/3366423.3380201
Gu, S., et al.: Vector quantized diffusion model for text-to-image synthesis. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 10686–10696 (2022). https://doi.org/10.1109/CVPR52688.2022.01043
Guimaraes, G.L., Sanchez-Lengeling, B., Outeiral, C., Farias, P.L.C., Aspuru-Guzik, A.: Objective-reinforced generative adversarial networks (organ) for sequence generation models. arXiv preprint arXiv:1705.10843 (2018)
Gómez-Bombarelli, R., Wei, J.N., Duvenaud, D., et al.: Automatic chemical design using a data-driven continuous representation of molecules. ACS Cent. Sci. 4(2), 268–276 (2018). https://doi.org/10.1021/acscentsci.7b00572
Haefeli, K.K., Martinkus, K., Perraudin, N., Wattenhofer, R.: Diffusion models for graphs benefit from discrete state spaces. In: The First Learning on Graphs Conference (2022)
Ho, J., Jain, A., Abbeel, P.: Denoising diffusion probabilistic models. In: Advances in Neural Information Processing Systems. vol. 33, pp. 6840–6851. Curran Associates, Inc. (2020)
Ho, J., Salimans, T.: Classifier-free diffusion guidance. In: NeurIPS 2021 Workshop DGMs Applications (2022)
Jin, W., Barzilay, R., Jaakkola, T.: Junction tree variational autoencoder for molecular graph generation. In: Proceedings of the 35th International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 80, pp. 2323–2332. PMLR (2018)
Jin, W., Barzilay, R., Jaakkola, T.: Hierarchical generation of molecular graphs using structural motifs. In: Proceedings of the 37th International Conference on Machine Learning. ICML2020, JMLR.org (2020)
Johnson, D.D., Austin, J., van den Berg, R., Tarlow, D.: Beyond in-place corruption: insertion and deletion in denoising probabilistic models. In: ICML Workshop on Invertible Neural Networks, Normalizing Flows, and Explicit Likelihood Models (2021)
Kingma, D.P., Welling, M.: Auto-encoding variational bayes. arXiv preprint arXiv:1312.6114 (2022)
Krenn, M., Ai, Q., Barthel, S., et al.: Selfies and the future of molecular string representations. Patterns 3(10), 100588 (2022). https://doi.org/10.1016/j.patter.2022.100588
Landrum, G.: RDKit: open-source cheminformatics software (2016)
Li, Y., Vinyals, O., Dyer, C., Pascanu, R., Battaglia, P.: Learning deep generative models of graphs. arXiv preprint arXiv:1803.03324 (2018)
Liu, C., et al.: Generative diffusion models on graphs: methods and applications. In: Proceedings of the Thirty-Second International Joint Conference on Artificial Intelligence, IJCAI-23, pp. 6702–6711. International Joint Conferences on Artificial Intelligence Organization (2023). https://doi.org/10.24963/ijcai.2023/751, survey Track
Liu, Y., Zhao, T., Ju, W., et al.: Materials discovery and design using machine learning. J. Materiomics 3(3), 159–177 (2017). https://doi.org/10.1016/j.jmat.2017.08.002
Perez, E., Strub, F., de Vries, H., Dumoulin, V., Courville, A.: Film: visual reasoning with a general conditioning layer. In: Proceedings of the AAAI Conference on Artificial Intelligence 32(1) (2018). https://doi.org/10.1609/aaai.v32i1.11671
Radford, A., et al.: Learning transferable visual models from natural language supervision. In: Proceedings of the 38th International Conference on Machine Learning, ICML 2021, 18-24 July 2021, Virtual Event. Proceedings of Machine Learning Research, vol. 139, pp. 8748–8763. PMLR (2021)
Ramakrishnan, R., Dral, P.O., Rupp, M., von Lilienfeld, O.A.: Quantum chemistry structures and properties of 134 kilo molecules. Sci. Data 1(1) (2014). https://doi.org/10.1038/sdata.2014.22
Reddi, S., Kale, S., Kumar, S.: On the convergence of adam and beyond. In: International Conference on Learning Representations (2018)
Runcie, N.T., Mey, A.S.: SILVR: guided diffusion for molecule generation. J. Chem. Inf. Model. 63(19), 5996–6005 (2023). https://doi.org/10.1021/acs.jcim.3c00667
Saharia, C., et al..: Photorealistic text-to-image diffusion models with deep language understanding. In: Advances in Neural Information Processing Systems, vol. 35, pp. 36479–36494. Curran Associates, Inc. (2022)
Shi*, C., Xu*, M., Zhu, Z., Zhang, W., Zhang, M., Tang, J.: GraphAF: a flow-based autoregressive model for molecular graph generation. In: International Conference on Learning Representations (2020)
Sousa, T., Correia, J., Pereira, V., Rocha, M.: Generative deep learning for targeted compound design. J. Chem. Inf. Model. 61(11), 5343–5361 (2021). https://doi.org/10.1021/acs.jcim.0c01496
Tang, Z., Gu, S., Bao, J., et al.: Improved vector quantized diffusion models. arXiv preprint arXiv:2205.16007 (2023)
Turney, J.M., Simmonett, A.C., Parrish, R.M., et al.: Psi4: an open-source ab initio electronic structure program. WIREs Comput. Mol. Sci. 2(4), 556–565 (2012). https://doi.org/10.1002/wcms.93
Vignac, C., Krawczuk, I., Siraudin, A., Wang, B., Cevher, V., Frossard, P.: Digress: discrete denoising diffusion for graph generation. In: The Eleventh International Conference on Learning Representations (2023)
Weininger, D.: Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. J. Chem. Inform. Comput. Sci. 28(1), 31–36 (1988). https://doi.org/10.1021/ci00057a005
You, J., Liu, B., Ying, Z., Pande, V., Leskovec, J.: Graph convolutional policy network for goal-directed molecular graph generation. In: Advances in Neural Information Processing Systems, vol. 31. Curran Associates, Inc. (2018)
You, J., Ying, R., Ren, X., et al.: GraphRNN: generating realistic graphs with deep auto-regressive models. In: Proceedings of the 35th International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 80, pp. 5708–5717. PMLR (10–15 Jul 2018)
Acknowledgements
Research partly funded by PNRR - M4C2 - Investimento 1.3, Partenariato Esteso PE00000013 -"FAIR - Future Artificial Intelligence Research" - Spoke 1 "Human-centered AI", funded by the European Commission under the NextGeneration EU programme.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Ethics declarations
Disclosure of Interests
The authors have no competing interests to declare that are relevant to the content of this article.
1 Electronic supplementary material
Below is the link to the electronic supplementary material.
Rights and permissions
Copyright information
© 2024 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Ninniri, M., Podda, M., Bacciu, D. (2024). Classifier-Free Graph Diffusion for Molecular Property Targeting. In: Bifet, A., Davis, J., Krilavičius, T., Kull, M., Ntoutsi, E., Žliobaitė, I. (eds) Machine Learning and Knowledge Discovery in Databases. Research Track. ECML PKDD 2024. Lecture Notes in Computer Science(), vol 14944. Springer, Cham. https://doi.org/10.1007/978-3-031-70359-1_19
Download citation
DOI: https://doi.org/10.1007/978-3-031-70359-1_19
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-70358-4
Online ISBN: 978-3-031-70359-1
eBook Packages: Computer ScienceComputer Science (R0)