Skip to main content

Advertisement

Log in

Towards Exploring Large Molecular Space: An Efficient Chemical Genetic Algorithm

  • Regular Paper
  • Published:
Journal of Computer Science and Technology Aims and scope Submit manuscript

Abstract

Generating molecules with desired properties is an important task in chemistry and pharmacy. An efficient method may have a positive impact on finding drugs to treat diseases like COVID-19. Data mining and artificial intelligence may be good ways to find an efficient method. Recently, both the generative models based on deep learning and the work based on genetic algorithms have made some progress in generating molecules and optimizing the molecule's properties. However, existing methods need to be improved in efficiency and performance. To solve these problems, we propose a method named the Chemical Genetic Algorithm for Large Molecular Space (CALM). Specifically, CALM employs a scalable and efficient molecular representation called molecular matrix. Then, we design corresponding crossover, mutation, and mask operators inspired by domain knowledge and previous studies. We apply our genetic algorithm to several tasks related to molecular property optimization and constraint molecular optimization. The results of these tasks show that our approach outperforms the other state-of-the-art deep learning and genetic algorithm methods, where the z tests performed on the results of several experiments show that our method is more than 99% likely to be significant. At the same time, based on the experimental results, we point out the insufficiency in the experimental evaluation standard which affects the fair evaluation of previous work.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

Explore related subjects

Discover the latest articles, news and stories from top researchers in related subjects.

References

  1. DiMasi J A, Grabowski H G, Hansen R W. Innovation in the pharmaceutical industry: New estimates of R&D costs. Journal of Health Economics, 2016, 47: 20-33. https://doi.org/10.1016/j.jhealeco.2016.01.012.

    Article  Google Scholar 

  2. Sanchez-Lengeling B, Aspuru-Guzik A. Inverse molecular design using machine learning: Generative models for matter engineering. Science, 2018, 361(6400): 360-365. https://doi.org/10.1126/science.aat2663.

    Article  Google Scholar 

  3. Broadbelt L J, Stark S M, Klein M T. Computer generated pyrolysis modeling: On-the-y generation of species, reactions, and rates. Industrial and Engineering Chemistry Research, 1994, 33(4): 790-799. https://doi.org/10.1021/ie00028a003.

    Article  Google Scholar 

  4. Devlin J, Chang M W, Lee K, Toutanova K. BERT: Pretraining of deep bidirectional transformers for language understanding. arXiv.: 1810.04805, 2018. https://arxiv.org/abs/1810.04805, Nov. 2022.

  5. Girshick R. Fast R-CNN. In Proc. the 15th IEEE International Conference on Computer Vision, December 2015, pp.1440-1448. https://doi.org/10.1109/ICCV.2015.169.

  6. He K M, Gkioxari G, Dollár P, Girshick R, Mask R-CNN. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2020, 42(2): 386-397. https://doi.org/10.1109/TPAMI.2018.2844175.

    Article  Google Scholar 

  7. LeCun Y, Bottou L, Bengio Y, Haffner P. Gradient-based learning applied to document recognition. Proceedings of the IEEE, 1998, 86(11): 2278-2324. https://doi.org/10.1109/5.726791.

    Article  Google Scholar 

  8. Peters J, Schaal S. Policy gradient methods for robotics. In Proc. the 19th IEEE/RSJ International Conference on Intelligent Robots and Systems, October 2006, pp.2219-2225. https://doi.org/10.1109/IROS.2006.282564.

  9. Liu Q, Allamanis M, Brockschmidt M, Gaunt A L. Constrained graph variational autoencoders for molecule design. In Proc. the 32nd International Conference on Neural Information Processing Systems, Dec. 2018, pp.7806-7815.

  10. Schütt K T, Arbabzadah F, Chmiela S, Müller K R, Tkatchenko A. Quantum-chemical insights from deep tensor neural networks. Nature Communications, 2017, 8: 13890. https://doi.org/10.1038/ncomms13890.

    Article  Google Scholar 

  11. Lu C Q, Liu Q, Wang C, Huang Z Y, Lin P Z, He L X. Molecular property prediction: A multilevel quantum interactions modeling perspective. In Proc. the 33rd AAAI Conference on Artificial Intelligence, Jul. 2019, pp.1052-1060. https://doi.org/10.1609/aaai.v33i01.33011052.

  12. You J X, Liu B W, Ying R, Pande V, Leskovec J. Graph convolutional policy network for goal-directed molecular graph generation. In Proc. the 32nd International Conference on Neural Information Processing Systems, Dec. 2018, pp.6412-6422.

  13. Hao Z K, Lu C Q, Huang Z Y,Wang H, Hu Z Y, Liu Q, Chen E H, Lee C. ASGN: An active semi-supervised graph neural network for molecular property prediction. In Proc. the 26th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Aug. 2020, pp.731-752. https://doi.org/10.1145/3394486.3403117.

  14. Polishchuk P G, Madzhidov T I, Varnek A. Estimation of the size of drug-like chemical space based on GDB-17 data. Journal of Computer Aided Molecular Design, 2013, 27(8): 675-679. https://doi.org/10.1007/s10822-013-9672-4.

    Article  Google Scholar 

  15. Macarron R, Banks M N, Bojanic D, Burns D J, Cirovic D A, Garyantes T, Green D V S, Hertzberg R P, Janzen W P, Paslay J W, Schopfer U, Sittampalam G S. Impact of high-throughput screening in biomedical research. Nature Reviews Drug Discovery, 2011, 10(3): 188-195. https://doi.org/10.1038/nrd3368.

    Article  Google Scholar 

  16. Pyzer-Knapp E O, Suh C, Gómez-Bombarelli R, Aguilera-Iparraguirre J, Aspuru-Guzik A. What is high-throughput virtual screening? A perspective from organic materials discovery. Annual Review of Materials Research, 2015, 45: 195-216. https://doi.org/10.1146/annurev-matsci-070214-020823.

    Article  Google Scholar 

  17. Goodfellow I J, PougetAbadie J, Mirza M, Xu B, Warde-Farley D, Ozair S, Courville A, Bengio Y. Generative adversarial nets. In Proc. the 27th International Conference on Neural Information Processing Systems, December 2014, pp.2672-2680.

  18. Kingma D P, Welling M. Auto-encoding variational bayes. arXiv: 1312.6114, 2013. https://arxiv.org/abs/1312.6114, Nov. 2022.

  19. Kipf T N, Welling M. Variational graph auto-encoders. arXiv: 1611.07308, 2011. https://arxiv.org/abs/1611.073-08, Nov. 2022.

  20. Grover A, Zweig A, Ermon S. Graphite: Iterative generative modeling of graphs. In Proc. the 36th International Conference on Machine Learning, May 2019, pp.2434-2444.

  21. Simonovsky M, Komodakis N. GraphVAE: Towards generation of small graphs using variational autoencoders. In Proc. the 27th International Conference on Artificial Neural Networks, Oct. 2018, pp.412-422.

  22. You J X, Ying R, Ren X, Hamilton W L, Leskovec J. GraphRNN: Generating realistic graphs with deep autoregressive models. In Proc. the 35th International Conference on Machine Learning, Jul. 2018, pp.5694-5703.

  23. Liao R J, Li Y J, Song Y, Wang S L, Hamilton W L, Duvenaud D, Urtasun R, Zemel R. Efficient graph generation with graph recurrent attention networks. arXiv: 1910.00760, 2019. https://arxiv.org/abs/1910.00760, Oct. 2019.

  24. You J X, Wu H Z, Barrett C, Ramanujan R, Leskovec J. G2SAT: Learning to generate SAT formulas. In Proc. the 32nd International Conference on Neural Information Processing Systems, Dec. 2019, pp.10552-10563.

  25. Gómez-Bombarelli R, Wei J N, Duvenaud D, Hernández-Lobato J M, Sánchez-Lengeling B, Sheberla D, Aguilera-Iparraguirre J, Hirzel T D, Adams R P, Aspuru-Guzik A. Automatic chemical design using a data-driven continuous representation of molecules. ACS Central Science, 2018, 4(2): 268-276. https://doi.org/10.1021/acscentsci.7b00572.

    Article  Google Scholar 

  26. Weininger D. SMILES, a chemical language and information system. 1. Introduction to methodology and encoding rules. Journal of Chemical Information and Modeling, 1988, 28(1): 31-36. https://doi.org/10.1021/ci00057a005.

    Article  Google Scholar 

  27. Samanta B, De A, Jana G, Chattaraj P K, Ganguly N, Rodriguez M G. NeVAE: A deep generative model for molecular graphs. In Proc. the 33rd AAAI Conference on Artificial Intelligence, Jul. 2019, pp.1110-1117. https://doi.org/10.1609/aaai.v33i01.33011110.

  28. Jin W G, Barzilay R, Jaakkola T S. Junction tree variational autoencoder for molecular graph generation. In Proc. the 35th International Conference on Machine Learning, Jul. 2018, pp. 2328-2337.

  29. Sutton R S, Barto A G. Reinforcement Learning: An Introduction. MIT Press, 2018.

  30. Alperstein Z, Cherkasov A, Rolfe J T. All SMILES variational autoencoder. 1905.13343, 2019. https://arxiv.org/abs/1905.13343, Nov. 2022.

  31. Yoshikawa N, Terayama K, Sumita M, Homma T, Oono K, Tsuda K. Population-based de novo molecule generation, using grammatical evolution. Chemistry Letters, 2018, 47(11): 1431-1434. https://doi.org/10.1246/cl.180665.

    Article  Google Scholar 

  32. Jensen J H. A graph-based genetic algorithm and generative model/Monte Carlo tree search for the exploration of chemical space. Chemical Science, 2019, 10(12): 3567-3572. https://doi.org/10.1039/C8SC05372C.

    Article  Google Scholar 

  33. Nigam A, Friederich P, Krenn M, Aspuru-Guzik A. Augmenting genetic algorithms with deep neural networks for exploring the chemical space. In Proc. the 8th International Conference on Learning Representations, April 2020, pp.250-256.

  34. Banzhaf W, Nordin P, Keller R E, Francone F D. Genetic Programming: An Introduction on the Automatic Evolution of Computer Programs and Its Application. Morgan Kaufmann Publishers, 1998.

  35. Kim Y, Kim W Y. Universal structure conversion method for organic molecules: From atomic connectivity to three-dimensional geometry. Bulletin of the Korean Chemical Society, 2015, 36(7): 1769-1777. https://doi.org/10.1002/bkcs.10334.

    Article  Google Scholar 

  36. Irwin J J, Sterling T, Mysinger M M, Bolstad E S, Coleman R G. ZINC: A free tool to discover chemistry for biology. Journal of Chemical Information and Modeling, 2012, 52(7): 1757-1768. https://doi.org/10.1021/ci3001277.

    Article  Google Scholar 

  37. Coley C W, Green W H, Jensen K F. RDChiral: An RDKit wrapper for handling stereochemistry in retrosynthetic template extraction and application. Journal of Chemical Information and Modeling, 2019, 59(6): 2529-2537. https://doi.org/10.1021/acs.jcim.9b00286.

    Article  Google Scholar 

  38. Ertl P, Schuffenhauer A. Estimation of synthetic accessibility score of drug-like molecules based on molecular complexity and fragment contributions. Journal of Cheminformatics, 2009, 1: Article No. 8. https://doi.org/10.1186/1758-2946-1-8.

  39. Bickerton G R, Paolini G V, Besnard J, Muresan S, Hopkins A L. Quantifying the chemical beauty of drugs. Nature Chemistry, 2012, 4(2): 90-98. https://doi.org/10.1038/nchem.1243.

    Article  Google Scholar 

  40. Zhou Z P, Kearnes S, Li L, Zare R N, Riley P. Optimization of molecules via deep reinforcement learning. Scientific Reports, 2019, 9(1): 10752. https://doi.org/10.1038/s41598-019-47148-x.

    Article  Google Scholar 

  41. Bleicher K H, Böhm H J, Müller K, Alanine A I. Hit and lead generation: Beyond high-throughput screening. Nature Reviews Drug Discovery, 2003, 2(5): 369-378. https://doi.org/10.1038/nrd1086.

    Article  Google Scholar 

  42. Jin W G, Yang K, Barzilay R, Jaakkola T. Learning multimodal graph-to-graph translation for molecular optimization. arXiv: 1812.01070, 2018. https://arxiv.org/abs/181-2.01070, Nov. 2022.

  43. Assouel R, Ahmed M, Segler M H, Saffari A, Bengio Y. DEFactor: Differentiable edge factorization-based probabilistic graph generation. arXiv: 1811.09766, 2018. https://arxiv.org/abs/1811.09766, Nov. 2022.

Download references

Acknowledgement

The authors would like to thank the valuable comments from the reviewers and those important corrections from Dr. Jan H. Jenson.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Qi Liu.

Supplementary Information

ESM 1

(PDF 107 kb)

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Zhu, JF., Hao, ZK., Liu, Q. et al. Towards Exploring Large Molecular Space: An Efficient Chemical Genetic Algorithm. J. Comput. Sci. Technol. 37, 1464–1477 (2022). https://doi.org/10.1007/s11390-021-0970-3

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11390-021-0970-3

Keywords