Abstract
In the last decade, the research community has implemented various applications of deep learning concepts to solve quite advanced tasks in chemistry, ranging from computational chemistry to materials and drug design and even chemical synthesis problems at both laboratory and industrial – grades. Because of the advantages as a high-performance prediction tool in molecular simulations, deep learning is becoming far more than just a temporary trend. Instead, it is foreseen as a tool that will be essential to employ throughout tackling a range of different issues in chemical sciences in the nearest future. In this paper, we propose a novel methodology for regularization of deep neural networks used in chemo-informatics. The methodology consists of four blocks: Class of initial conditions; Orthogonalization, Activation and Standardization. Three graph-based architectures are developed: deep tensor neural network, directed acyclic graph and convolutional graph model. Graph-based models are more convenient for modeling molecules since the molecules and their features are often naturally represented by graphs. Several experiments are obtained on datasets from MoleculeNet aggregator: QM7, QM8, QM9, ToxCast, Tox21, ClinTox, BBBP and SIDER, for predicting geometric, energetic, electronic and thermodynamic properties on small molecules. The obtained results outperform some of the published references and give directions for further improvement. As a particular example, in one of the architectures, we have reduced mean absolute error by more than 12 times compared to conventional regression models, and more than 3 times in comparison to deep networks where the proposed methodology is not implemented.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Sandjakoska, L., Bogdanova, A.M.: Deep learning: the future of chemoinformatics and drug development. In: 15th International Conference on Informatics and Information Technologies, CIIT (2018)
Unterthiner, T., Mayr, A., Klambauer, G., Hochreiter, S.: Toxicity prediction using deep learning. arXiv preprint arXiv:1503.01445 (2015)
Unterthiner, T., et al.: Deep learning for drug target prediction. Work. Represent. Learn. Methods Complex Outputs (2014)
Hamanaka, M., et al.: CGBVS-DNN: prediction of compound-protein interactions based on deep learning. Mol. Inf. 36(1–2), 1600045 (2017)
Ma, J., Sheridan, R.P., Liaw, A., Dahl, G.E., Svetnik, V.: Deep neural nets as a method for quantitative structure–activity relationships. J. Chem. Inf. Model. 55(2), 263–274 (2015)
Hughes, T.B., Miller, G.P., Swamidass, S.J.: Modeling epoxidation of drug-like molecules with a deep machine learning network. ACS Cent. Sci. 1(4), 168–180 (2015)
Caruana, R.: Multitask learning. Mach. Learn. 28(1), 41–75 (1997)
Bengio, Y., Courville, A., Vincent, P.: Representation learning: a review and new perspectives. IEEE Trans. Pattern Anal. Mach. Intell. 35(8), 1798–1828 (2013)
Bengio, Y.: Deep learning of representations: looking forward. In: Dediu, A.-H., Martín-Vide, C., Mitkov, R., Truthe, B. (eds.) SLSP 2013. LNCS (LNAI), vol. 7978, pp. 1–37. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-39593-2_1
Tian, K., Shao, M., Wang, Y., Guan, J., Zhou, S.: Boosting compound-protein interaction prediction by deep learning. Methods 110, 64–72 (2016)
Zawbaa, H.M., Szlȩk, J., Grosan, C., Jachowicz, R., Mendyk, A.: Computational intelligence modeling of the macromolecules release from PLGA microspheres—Focus on feature selection. PLoS ONE 11(6), e0157610 (2016)
Lusci, A., Pollastri, G., Baldi, P.: Deep architectures and deep learning in chemoinformatics: the prediction of aqueous solubility for drug-like molecules. J. Chem. Inf. Model. 53(7), 1563–1575 (2013)
Martins, I.F., Teixeira, A.L., Pinheiro, L., Falcao, A.O.: J. Chem. Inf. Model. 52, 1686–1697 (2012)
Schütt, K.T., Arbabzadah, F., Chmiela, S., Müller, K.R., Tkatchenko, A.: Quantum-chemical insights from deep tensor neural networks. Nat. Commun. 8(1), 1–8 (2017)
Altae-Tran, H., Ramsundar, B., Pappu, A.S., Pande, V.: Low data drug discovery with one-shot learning. ACS Cent. Sci. 3(4), 283–293 (2017)
Gayvert, K.M., Madhukar, N.S., Elemento, O.: A data-driven approach to predicting successes and failures of clinical trials. Cell Chem. Biol. 23(10), 1294–1301 (2016)
Artemov, A.V., Putin, E., Vanhaelen, Q., Aliper, A., Ozerov, I.V., Zhavoronkov, A.: Integrated deep learned transcriptomic and structure-based predictor of clinical trials outcomes. BioRxiv, p. 095653 (2016)
Jain, A.N., Nicholls, A.: Recommendations for evaluation of computational methods. J. Comput. Aided Mol. Des. 22(3–4), 133–139 (2008). https://doi.org/10.1007/s10822-008-9196-5
Wu, Z., et al.: MoleculeNet: a benchmark for molecular machine learning. Chem. Sci. 9(2), 513–530 (2018)
Deng, J., Dong, W., Socher, R., Li, L.J., Li, K., Fei-Fei, L.: ImageNet: a large-scale hierarchical image database. In: 2009 IEEE Conference on Computer Vision and Pattern Recognition, pp. 248–255. IEEE (2009)
Miller, G.A.: WordNet: a lexical database for English. Commun. ACM 38(11), 39–41 (1995)
Ramsundar, B.: Molecular machine learning with DeepChem. Doctoral dissertation, Stanford University (2018)
Abadi, M., et al.: Tensorflow: large-scale machine learning on heterogeneous distributed systems. arXiv preprint arXiv:1603.04467 (2016)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Sandjakoska, L., Bogdanova, A.M., Pejov, L. (2022). Novel Methodology for Improving the Generalization Capability of Chemo-Informatics Deep Learning Models. In: Zdravkova, K., Basnarkov, L. (eds) ICT Innovations 2022. Reshaping the Future Towards a New Normal. ICT Innovations 2022. Communications in Computer and Information Science, vol 1740. Springer, Cham. https://doi.org/10.1007/978-3-031-22792-9_13
Download citation
DOI: https://doi.org/10.1007/978-3-031-22792-9_13
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-22791-2
Online ISBN: 978-3-031-22792-9
eBook Packages: Computer ScienceComputer Science (R0)