Summary
Capability of evolutionary neural network (ENN) based QSAR approach to direct the descriptor selection process towards stable descriptor subset (DS) composition characterized by acceptable generalization, as well as the influence of description stability on QSAR model interpretation have been examined. In order to analyze the DS stability and QSAR model generalization properties multiple random dataset partitions into training and test set were made. Acceptability criteria proposed by Golbraikh et al. [J. Comput.-Aided Mol. Des., 17 (2003) 241] have been chosen for selection of highly predictive QSAR models from a set of all models produced by ENN for each dataset splitting. All QSAR models that pass Golbraikh’s filter generated by ENN for each dataset partition were collected. Two final DS forming principles were compared. Standard principle is based on selection of descriptors characterized by highest frequencies among all descriptors that appear in the pool [J. Chem. Inf. Comput. Sci., 43 (2003) 949]. Search across the model pool for DS that are stable against multiple dataset subsampling i.e. universal DS solutions is the basis of novel approach. Based on described principles benzodiazepine QSAR has been proposed and evaluated against results reported by others in terms of final DS composition and model predictive performance.
Similar content being viewed by others
Explore related subjects
Discover the latest articles and news from researchers in related subjects, suggested using machine learning.Abbreviations
- QSAR:
-
quantitative structure activity relationship
- descriptor:
-
attribute
- molecule:
-
object
- input variable:
-
independent variable
- output variable:
-
dependent variable
- m–n–p:
-
fully connected NN topology with biases where input layer contains m input neurons hidden layer contains n hidden neurons and output layer contains p neurons
- ENN:
-
evolutionary neural networks
- GNN:
-
genetic neural networks
- NN:
-
neural networks
- DS:
-
descriptor subsets
- LOO:
-
leave one out
- EA:
-
evolutionary algorithm
- CPU:
-
central processing unit
- PC:
-
personal computer
- FF:
-
fitness function, also called objective or merit function
- MSE:
-
mean squared error
- SSE:
-
sum of squared errors
- RMSE:
-
root mean squared errors
- CV:
-
coefficient of variation (relative standard deviation)
- LMO:
-
leave many out
- ROC:
-
receiver operating characteristic
- SCG:
-
scaled conjugated gradient
- SNNS:
-
Stuttgart neural network simulator
- GABA:
-
γ-aminobutiric acid
- IC:
-
inhibitory concentration
- 〈X〉:
-
average X value
- ntot :
-
total number of molecules
- npart :
-
number of complete dataset partitions into training and external validation set
- ntest :
-
number of external validation set molecules
- nsubd :
-
number of internal validation sets
- T :
-
critical npart fraction corresponding to number of complete dataset partitions for which specific DS fails to pass at least one of the predictive performance filters
- p(α):
-
probability of type I statistical error
- p(β):
-
probability of type II statistical error
References
Guyon I., Elisseef A., (2003) J. Mach. Learn. Res. 3:1157
Molina, L.C., Belanche, L. and Nebot, A., 2002 IEEE International Conference on Data Mining, Maebashi City, Japan, September 09–12, 2002
So S.S., Karplus M., (1996) J. Med. Chem. 39:1521
So S.S., Karplus M., (1996) J. Med. Chem. 39:5246
So S.S., Karplus M., (1997) J. Med. Chem. 40:4347
So S.S., Karplus M., (1997) J. Med. Chem. 40:4360
So S.S., van Helden S.P., van Geerestein V.J., Karplus M., (2000) J. Chem. Inf. Comput. Sci. 40:762
Kyngäs J., Valjakka, (1996) J. Quant. Struct.-Act. Relat. 15:296
Patankar S.J., Jurs P.C., (2000) J. Chem. Inf. Comput. Sci. 40:706
Patankar S.J., Jurs P.C., (2002) J. Chem. Inf. Comput. Sci. 42:1053
Mattioni B.E., Kauffman G.W., Jurs P.C., Custer L.L., Durham S.K., Pearl G.M., (2003) J. Chem. Inf. Comput. Sci. 43:949
Hemmateenejad B., Akhond M., Miri R., Shamsipur M., (2003) J. Chem. Inf. Comput. Sci. 43:1328
Kohavi, R., A study of cross-validation and bootstrap for accuracy estimation and model selection, Technical Report. Computer Science Department, Stanford University, Stanford, 1995
Lunneborg, C.E., Data Analysis by Resampling: Concepts and Applications. Duxbury Press, Pacific Grove, 2000
Breiman L., (2001) Statist. Sci. 16:199
Breiman, L., Bagging predictors, Technical Report. Department of Statistics, University of California, Berkley, 1994
Breiman L., (2001) Mach. Learn. 45:5
Tetko I.V., Livingstone D.J., Luik A.I., (1995) J. Chem. Inf. Comput. Sci. 35:826
Yao X., Liu Y., (1998) IEEE Trans. Syst. Man. Cybern. B Cybern. 28:417
Leardi R., Lupianez Gonzalez A., (1998) Chemometr. Intell. Lab. Syst. 41:195
Baumann K., (2003) Trends Anal. Chem. 22:395
Kohavi, R. and Sommerfeld, D., Feature subset selection using wrapper method: overfitting and dynamic search space topology, Technical Report. Computer Science Department, Stanford University, Stanford, 1995
Baumann K., Albert H., von Korff M., (2002) J. Chemomet. 16:339
Baumann K., von Korff M., Albert H., (2002) J. Chemomet. 16:351
Golbraikh A., Tropsha A., (2002) J. Mol. Graphics Model. 20:269
Tropsha A., Gramatica P., Gombar V.K., (2003) QSAR Comb. Sci. 22:69
Golbraikh A., Shen M., Xiao Z., Xiao Y.-D., Lee K.-H., Tropsha A., (2003) J. Comput.-Aided Mol. Des. 17:241
Hawkins D.M., Basak S.C., Mills D., (2003) J. Chem. Inf. Comput. Sci. 43:579
Todeschini R., Consonni V., Mauri A., Pavan M., (2004) Anal. Chim. Acta 515:199
Clark R.D., (2003) J. Comput.-Aided Mol. Des. 17:265
Shen Q., Jiang J.-H., Shen G.-L., Yu R.-Q., (2003) Anal. Bioanal. Chem. 375:248
Maddalena D.J., Johnston G.A.R., (1995) J. Med. Chem. 38:715
Aoyama T., Suzuki Y., Ichikawa H., (1990) J. Med. Chem. 33:2583
Abraham, A., Optimization of Evolutionary Neural Networks Using Hybrid Learning Algorithms, Technical Report, School of Business Systems, Monash University, 2002
Zell, A. (Ed.), Stuttgart Neural Network Simulator User Manual, Version 4.2. University of Stuttgart and University of Tübingen, 1998
Haykin S., 1999 Neural Networks: A Comprehensive Foundation. (2), Prentice-Hall Inc.: Upper Saddle River, NJ
Gasteiger J., Zupan J., (1993) Angew. Chem. Int. Ed. Engl. 32:503
Møller M.F., (1993) Neural Networks 6:525
Chiu T.-L., So S.S., (2003) QSAR Comb. Sci. 22:519
Wagener M., Sadowski J., Gasteiger J., (1995) J. Am. Chem. Soc. 117:7769
Dudewitz E.J., Mishra S.N., 1988 Modern Mathematical Statistics, John Wiley and Sons, New York
Svetnik V., Liaw A., Tong C., Culberson J.C., Sheridan R.P., Feuston B.P., (2003) J. Chem. Inf. Comput. Sci. 43:1947
Kohavi R., John G.H., (1997) Artif. Intell. 97:273
Acknowledgement
This work was supported by the Ministry of Science and Technology of the Republic of Croatia through Grant 0006541.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Debeljak, Ž., Marohnić, V., Srečnik, G. et al. Novel approach to evolutionary neural network based descriptor selection and QSAR model development. J Comput Aided Mol Des 19, 835–855 (2005). https://doi.org/10.1007/s10822-005-9022-2
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10822-005-9022-2