Skip to main content

Advertisement

Log in

Nc-vae: normalised conditional diverse variational autoencoder guided de novo molecule generation

  • Published:
The Journal of Supercomputing Aims and scope Submit manuscript

Abstract

This work proposes a novel approach for drug molecule design using data-assisted techniques. This approach leverages a generation-based framework to expedite the drug discovery process, aiming to identify candidate molecules suitable for production while minimizing development timelines and regulatory hurdles. The core of the proposed method is a conditional variational autoencoder (CVAE) for molecule generation, employing NCSMILES string representation. The framework involves three key stages: (1) molecule generation using the CVAE, (2) filtering based on a scoring function, and (3) identification of the optimal molecule from the generated pool. To enhance the latent space representation, we incorporate molecule properties alongside conditional selection criteria. The performance of the proposed scheme is comprehensively evaluated on standard benchmark datasets using various metrics, including validity, diversity, usefulness, and novelty. The method demonstrates superior performance compared to existing state-of-the-art approaches, attributable to several key improvements, including intermediary optimizations and condition-based selection.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11

Similar content being viewed by others

Data availability

The data that support the findings of this study are openly available in https://github.com/arunsinghbhadwal/NRC-VABS.

Notes

  1. https://github.com/mkusner/grammarVAE.

  2. https://github.com/Hanjun-Dai/sdvae.

References

  1. Whitesides GM (2015) Reinventing chemistry. Angew Chem Int Ed 54:3196–3209

    Article  Google Scholar 

  2. Schneider P, Schneider G (2016) De novo design at the edge of chaos: miniperspective. J Med Chem 59(9):4077–4086

    Article  Google Scholar 

  3. Polishchuk PG, Madzhidov TI, Varnek A (2013) Estimation of the size of drug-like chemical space based on GDB-17 data. J Comput Aided Mol Des 27:675–679

    Article  Google Scholar 

  4. Harel S, Radinsky K (2018) Accelerating prototype-based drug discovery using conditional diversity networks. In: Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp 331–339

  5. Elton DC, Boukouvalas Z, Fuge MD, Chung PW (2019) Deep learning for molecular design—a review of the state of the art. Mol Syst Des Eng 4(4):828–849

    Article  Google Scholar 

  6. Mikolov T, Sutskever I, Chen K, Corrado GS, Dean J (2013) Distributed representations of words and phrases and their compositionality. In: Advances in neural information processing systems, vol 26

  7. Hinton G, Deng L, Yu D, Dahl GE, Mohamed A-R, Jaitly N, Senior A, Vanhoucke V, Nguyen P, Sainath TN et al (2012) Deep neural networks for acoustic modeling in speech recognition: the shared views of four research groups. IEEE Signal Process Mag 29(6):82–97

    Article  Google Scholar 

  8. Krizhevsky A, Sutskever I, Hinton GE (2012) Imagenet classification with deep convolutional neural networks. In: Advances in neural information processing systems, vol 25

  9. Gómez-Bombarelli R, Wei JN, Duvenaud D, Hernández-Lobato JM, Sánchez-Lengeling B, Sheberla D, Aguilera-Iparraguirre J, Hirzel TD, Adams RP, Aspuru-Guzik A (2018) Automatic chemical design using a data-driven continuous representation of molecules. ACS Cent Sci 4(2):268–276

    Article  Google Scholar 

  10. Segler MH, Kogej T, Tyrchan C, Waller MP (2018) Generating focused molecule libraries for drug discovery with recurrent neural networks. ACS Cent Sci 4(1):120–131

    Article  Google Scholar 

  11. Bhadwal AS, Kumar K, Kumar N (2024) NRC-VABS: Normalized reparameterized conditional variational autoencoder with applied beam search in latent space for drug molecule design. Expert Syst Appl 240:122396

    Article  Google Scholar 

  12. Olivecrona M, Blaschke T, Engkvist O, Chen H (2017) Molecular de-novo design through deep reinforcement learning. J Cheminform 9(1):1–14

    Article  Google Scholar 

  13. Popova M, Isayev O, Tropsha A (2018) Deep reinforcement learning for de novo drug design. Sci Adv 4(7):7885

    Article  Google Scholar 

  14. Kumari M, Kaul A (2023) Deep learning techniques for remote sensing image scene classification: a comprehensive review, current challenges, and future directions. Concurr Comput Pract Exp 7733:e7733

    Article  Google Scholar 

  15. Bhadwal AS, Kumar K, Kumar N (2023) GenSMILES: an enhanced validity conscious representation for inverse design of molecules. Knowl Based Syst 268:110429

    Article  Google Scholar 

  16. Kaul A, Kumari M (2023) A literature review on remote sensing scene categorization based on convolutional neural networks. Int J Remote Sens 44(8):2611–2642

    Article  Google Scholar 

  17. Bhadwal AS, Kumar K, Kumar N (2023) GMG-NCDVAE: guided de novo molecule generation using NLP techniques and constrained diverse variational autoencoder. ACM Trans Asian Low-Resour Lang Inf Process. https://doi.org/10.1145/3610533

    Article  Google Scholar 

  18. Sutskever I, Vinyals O, Le QV (2014) Sequence to sequence learning with neural networks. In: Advances in neural information processing systems, vol 27

  19. Kingma DP, Welling M (2013) Auto-encoding variational bayes. arXiv preprint arXiv:1312.6114

  20. White D, Wilson RC (2010) Generative models for chemical structures. J Chem Inf Model 50(7):1257–1274

    Article  Google Scholar 

  21. Bhadwal AS, Kumar K (2022) GVA: gated variational autoencoder for de novo molecule generation. In: 2022 IEEE 9th Uttar Pradesh Section International Conference on Electrical, Electronics and Computer Engineering (UPCON). IEEE, pp 1–5

  22. Singh Bhadwal A, Kumar K (2023) Direct de novo molecule generation using probabilistic diverse variational autoencoder. In: Computer Vision and machine Intelligence: Proceedings of CVMI 2022. Springer, pp 13–22

  23. Gómez-Bombarelli R, Wei JN, Duvenaud D, Hernández-Lobato JM, Sánchez-Lengeling B, Sheberla D, Aguilera-Iparraguirre J, Hirzel TD, Adams RP, Aspuru-Guzik A (2018) Automatic chemical design using a data-driven continuous representation of molecules. ACS Cent Sci 4(2):268–276

    Article  Google Scholar 

  24. Blaschke T, Olivecrona M, Engkvist O, Bajorath J, Chen H (2018) Application of generative autoencoder in de novo molecular design. Mol Inform 37(1–2):1700123

    Article  Google Scholar 

  25. Kusner MJ, Paige B, Hernández-Lobato JM (2017) Grammar variational autoencoder. In: International Conference on Machine Learning. PMLR, pp 1945–1954

  26. Dai H, Tian Y, Dai B, Skiena S, Song L (2018) Syntax-directed variational autoencoder for structured data. arXiv preprint arXiv:1802.08786

  27. Makhzani A, Shlens J, Jaitly N, Goodfellow I, Frey B (2015) Adversarial autoencoders. arXiv preprint arXiv:1511.05644

  28. Bjerrum EJ, Threlfall R (2017) Molecular generation with recurrent neural networks (RNNs). arXiv preprint arXiv:1705.04612

  29. Yuan W, Jiang D, Nambiar DK, Liew LP, Hay MP, Bloomstein J, Lu P, Turner B, Le Q-T, Tibshirani R et al (2017) Chemical space mimicry for drug discovery. J Chem Inf Model 57(4):875–882

    Article  Google Scholar 

  30. Gupta A, Müller AT, Huisman BJ, Fuchs JA, Schneider P, Schneider G (2018) Generative recurrent networks for de novo drug design. Mol Inform 37(1–2):1700111

    Article  Google Scholar 

  31. Guimaraes GL, Sanchez-Lengeling B, Outeiral C, Farias PLC, Aspuru-Guzik A (2017) Objective-reinforced generative adversarial networks (organ) for sequence generation models. arXiv preprint arXiv:1705.10843

  32. Jaques N, Gu S, Bahdanau D, Hernández-Lobato JM, Turner RE, Eck D (2017) Sequence tutor: conservative fine-tuning of sequence generation models with kl-control. In: International Conference on Machine Learning. PMLR, pp 1645–1654

  33. Yüksel A, Ulusoy E, Ünlü A, Doğan T (2023) SELFormer: molecular representation learning via selfies language models. Sci Technol Mach Learn 4:025035

    Article  Google Scholar 

  34. Yoshikai Y, Mizuno T, Nemoto S, Kusuhara H (2024) Difficulty in chirality recognition for transformer architectures learning chemical structures from string representations. Nat Commun 15(1):1197

    Article  Google Scholar 

  35. Weininger D (1988) Smiles, a chemical language and information system. 1. Introduction to methodology and encoding rules. J Chem Inf Comput Sci 28(1):31–36

    Article  Google Scholar 

  36. Krenn M, Häse F, Nigam A, Friederich P, Aspuru-Guzik A (2020) Self-referencing embedded strings (selfies): a 100% robust molecular string representation. Mach Learn Sci Technol 1(4):045024

    Article  Google Scholar 

  37. Abadi M, Barham P, Chen J, Chen Z, Davis A, Dean J, Devin M, Ghemawat S, Irving G, Isard M, etal. (2016) \(\{\)TensorFlow\(\}\): a system for \(\{\)Large-Scale\(\}\) machine learning. In: 12th USENIX Symposium on Operating Systems Design and Implementation (OSDI 16), pp 265–283

  38. Kingma DP, Ba J (2014) Adam: a method for stochastic optimization. arXiv preprint arXiv:1412.6980

  39. Williams RJ (1989) A learning algorithm for continually running fully recurrent neural netwokrs. Neural Comput 1:256–263

    Article  Google Scholar 

  40. Irwin JJ, Sterling T, Mysinger MM, Bolstad ES, Coleman RG (2012) Zinc: a free tool to discover chemistry for biology. J Chem Inf Model 52(7):1757–1768

    Article  Google Scholar 

  41. Lipinski CA (2000) Drug-like properties and the causes of poor solubility and poor permeability. J Pharmacol Toxicol Methods 44(1):235–249

    Article  Google Scholar 

  42. Landrum G (2013) Rdkit documentation. Release 1(1–79):4

    Google Scholar 

  43. Wildman SA, Crippen GM (1999) Prediction of physicochemical parameters by atomic contributions. J Chem Inf Comput Sci 39(5):868–873

    Article  Google Scholar 

  44. Prasanna S, Doerksen R (2009) Topological polar surface area: a useful descriptor in 2D-QSAR. Curr Med Chem 16(1):21–41

    Article  Google Scholar 

  45. Kim S, Thiessen PA, Bolton EE, Chen J, Fu G, Gindulyte A, Han L, He J, He S, Shoemaker BA et al (2016) Pubchem substance and compound databases. Nucleic Acids Res 44(D1):1202–1213

    Article  Google Scholar 

  46. Ertl P, Schuffenhauer A (2009) Estimation of synthetic accessibility score of drug-like molecules based on molecular complexity and fragment contributions. J Cheminform 1:1–11

    Article  Google Scholar 

  47. Preuer K, Renz P, Unterthiner T, Hochreiter S, Klambauer G (2018) Fréchet chemnet distance: a metric for generative models for molecules in drug discovery. J Chem Inf Model 58(9):1736–1741

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Arun Singh Bhadwal.

Ethics declarations

Conflict of interest

All authors declare that he or she has no conflict of interest

Research involving human participants and/or animals

This article does not contain any studies with human participants or animals performed by any of the authors

Informed consent

Informed consent was obtained from all individual participants included in the study.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Bhadwal, A.S., Kumar, K. Nc-vae: normalised conditional diverse variational autoencoder guided de novo molecule generation. J Supercomput 80, 21207–21228 (2024). https://doi.org/10.1007/s11227-024-06250-2

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11227-024-06250-2

Keywords

Navigation