Novel Methodology for Improving the Generalization Capability of Chemo-Informatics Deep Learning Models

Sandjakoska, Ljubinka; Bogdanova, Ana Madevska; Pejov, Ljupcho

doi:10.1007/978-3-031-22792-9_13

Ljubinka Sandjakoska⁷,
Ana Madevska Bogdanova⁸ &
Ljupcho Pejov^9,10

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 1740))

Included in the following conference series:

International Conference on ICT Innovations

268 Accesses

Abstract

In the last decade, the research community has implemented various applications of deep learning concepts to solve quite advanced tasks in chemistry, ranging from computational chemistry to materials and drug design and even chemical synthesis problems at both laboratory and industrial – grades. Because of the advantages as a high-performance prediction tool in molecular simulations, deep learning is becoming far more than just a temporary trend. Instead, it is foreseen as a tool that will be essential to employ throughout tackling a range of different issues in chemical sciences in the nearest future. In this paper, we propose a novel methodology for regularization of deep neural networks used in chemo-informatics. The methodology consists of four blocks: Class of initial conditions; Orthogonalization, Activation and Standardization. Three graph-based architectures are developed: deep tensor neural network, directed acyclic graph and convolutional graph model. Graph-based models are more convenient for modeling molecules since the molecules and their features are often naturally represented by graphs. Several experiments are obtained on datasets from MoleculeNet aggregator: QM7, QM8, QM9, ToxCast, Tox21, ClinTox, BBBP and SIDER, for predicting geometric, energetic, electronic and thermodynamic properties on small molecules. The obtained results outperform some of the published references and give directions for further improvement. As a particular example, in one of the architectures, we have reduced mean absolute error by more than 12 times compared to conventional regression models, and more than 3 times in comparison to deep networks where the proposed methodology is not implemented.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 64.99; Price excludes VAT (USA)

Softcover Book: USD 84.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Sandjakoska, L., Bogdanova, A.M.: Deep learning: the future of chemoinformatics and drug development. In: 15th International Conference on Informatics and Information Technologies, CIIT (2018)
Google Scholar
Unterthiner, T., Mayr, A., Klambauer, G., Hochreiter, S.: Toxicity prediction using deep learning. arXiv preprint arXiv:1503.01445 (2015)
Unterthiner, T., et al.: Deep learning for drug target prediction. Work. Represent. Learn. Methods Complex Outputs (2014)
Google Scholar
Hamanaka, M., et al.: CGBVS-DNN: prediction of compound-protein interactions based on deep learning. Mol. Inf. 36(1–2), 1600045 (2017)
Article Google Scholar
Ma, J., Sheridan, R.P., Liaw, A., Dahl, G.E., Svetnik, V.: Deep neural nets as a method for quantitative structure–activity relationships. J. Chem. Inf. Model. 55(2), 263–274 (2015)
Article Google Scholar
Hughes, T.B., Miller, G.P., Swamidass, S.J.: Modeling epoxidation of drug-like molecules with a deep machine learning network. ACS Cent. Sci. 1(4), 168–180 (2015)
Article Google Scholar
Caruana, R.: Multitask learning. Mach. Learn. 28(1), 41–75 (1997)
Article MathSciNet Google Scholar
Bengio, Y., Courville, A., Vincent, P.: Representation learning: a review and new perspectives. IEEE Trans. Pattern Anal. Mach. Intell. 35(8), 1798–1828 (2013)
Article Google Scholar
Bengio, Y.: Deep learning of representations: looking forward. In: Dediu, A.-H., Martín-Vide, C., Mitkov, R., Truthe, B. (eds.) SLSP 2013. LNCS (LNAI), vol. 7978, pp. 1–37. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-39593-2_1
Chapter Google Scholar
Tian, K., Shao, M., Wang, Y., Guan, J., Zhou, S.: Boosting compound-protein interaction prediction by deep learning. Methods 110, 64–72 (2016)
Article Google Scholar
Zawbaa, H.M., Szlȩk, J., Grosan, C., Jachowicz, R., Mendyk, A.: Computational intelligence modeling of the macromolecules release from PLGA microspheres—Focus on feature selection. PLoS ONE 11(6), e0157610 (2016)
Article Google Scholar
Lusci, A., Pollastri, G., Baldi, P.: Deep architectures and deep learning in chemoinformatics: the prediction of aqueous solubility for drug-like molecules. J. Chem. Inf. Model. 53(7), 1563–1575 (2013)
Article Google Scholar
Martins, I.F., Teixeira, A.L., Pinheiro, L., Falcao, A.O.: J. Chem. Inf. Model. 52, 1686–1697 (2012)
Article Google Scholar
https://keras.io/
Schütt, K.T., Arbabzadah, F., Chmiela, S., Müller, K.R., Tkatchenko, A.: Quantum-chemical insights from deep tensor neural networks. Nat. Commun. 8(1), 1–8 (2017)
Article Google Scholar
Altae-Tran, H., Ramsundar, B., Pappu, A.S., Pande, V.: Low data drug discovery with one-shot learning. ACS Cent. Sci. 3(4), 283–293 (2017)
Article Google Scholar
Gayvert, K.M., Madhukar, N.S., Elemento, O.: A data-driven approach to predicting successes and failures of clinical trials. Cell Chem. Biol. 23(10), 1294–1301 (2016)
Article Google Scholar
Artemov, A.V., Putin, E., Vanhaelen, Q., Aliper, A., Ozerov, I.V., Zhavoronkov, A.: Integrated deep learned transcriptomic and structure-based predictor of clinical trials outcomes. BioRxiv, p. 095653 (2016)
Google Scholar
Jain, A.N., Nicholls, A.: Recommendations for evaluation of computational methods. J. Comput. Aided Mol. Des. 22(3–4), 133–139 (2008). https://doi.org/10.1007/s10822-008-9196-5
Article Google Scholar
Wu, Z., et al.: MoleculeNet: a benchmark for molecular machine learning. Chem. Sci. 9(2), 513–530 (2018)
Article Google Scholar
Deng, J., Dong, W., Socher, R., Li, L.J., Li, K., Fei-Fei, L.: ImageNet: a large-scale hierarchical image database. In: 2009 IEEE Conference on Computer Vision and Pattern Recognition, pp. 248–255. IEEE (2009)
Google Scholar
Miller, G.A.: WordNet: a lexical database for English. Commun. ACM 38(11), 39–41 (1995)
Article Google Scholar
Ramsundar, B.: Molecular machine learning with DeepChem. Doctoral dissertation, Stanford University (2018)
Google Scholar
Abadi, M., et al.: Tensorflow: large-scale machine learning on heterogeneous distributed systems. arXiv preprint arXiv:1603.04467 (2016)

Download references

Author information

Authors and Affiliations

Faculty of Computer Science and Engineering, UIST St. Paul the Apostle, Ohrid, Macedonia
Ljubinka Sandjakoska
Faculty of Computer Science and Engineering, University SS Cyril and Methodius, Skopje, Macedonia
Ana Madevska Bogdanova
Faculty of Natural Sciences and Mathematics, University SS Cyril and Methodius, Skopje, Macedonia
Ljupcho Pejov
Department of Chemistry, Bioscience and Environmental Engineering, Faculty of Science and Technology, University of Stavanger, Stavanger, Norway
Ljupcho Pejov

Authors

Ljubinka Sandjakoska
View author publications
You can also search for this author in PubMed Google Scholar
Ana Madevska Bogdanova
View author publications
You can also search for this author in PubMed Google Scholar
Ljupcho Pejov
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Ljubinka Sandjakoska .

Editor information

Editors and Affiliations

Saints Cyril and Methodius University of Skopje, Skopje, North Macedonia
Katerina Zdravkova
Saints Cyril and Methodius University of Skopje, Skopje, North Macedonia
Lasko Basnarkov

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Sandjakoska, L., Bogdanova, A.M., Pejov, L. (2022). Novel Methodology for Improving the Generalization Capability of Chemo-Informatics Deep Learning Models. In: Zdravkova, K., Basnarkov, L. (eds) ICT Innovations 2022. Reshaping the Future Towards a New Normal. ICT Innovations 2022. Communications in Computer and Information Science, vol 1740. Springer, Cham. https://doi.org/10.1007/978-3-031-22792-9_13

Download citation

DOI: https://doi.org/10.1007/978-3-031-22792-9_13
Published: 01 January 2023
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-22791-2
Online ISBN: 978-3-031-22792-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics