Skip to main content
Log in

Novel approach to evolutionary neural network based descriptor selection and QSAR model development

  • Published:
Journal of Computer-Aided Molecular Design Aims and scope Submit manuscript

Summary

Capability of evolutionary neural network (ENN) based QSAR approach to direct the descriptor selection process towards stable descriptor subset (DS) composition characterized by acceptable generalization, as well as the influence of description stability on QSAR model interpretation have been examined. In order to analyze the DS stability and QSAR model generalization properties multiple random dataset partitions into training and test set were made. Acceptability criteria proposed by Golbraikh et al. [J. Comput.-Aided Mol. Des., 17 (2003) 241] have been chosen for selection of highly predictive QSAR models from a set of all models produced by ENN for each dataset splitting. All QSAR models that pass Golbraikh’s filter generated by ENN for each dataset partition were collected. Two final DS forming principles were compared. Standard principle is based on selection of descriptors characterized by highest frequencies among all descriptors that appear in the pool [J. Chem. Inf. Comput. Sci., 43 (2003) 949]. Search across the model pool for DS that are stable against multiple dataset subsampling i.e. universal DS solutions is the basis of novel approach. Based on described principles benzodiazepine QSAR has been proposed and evaluated against results reported by others in terms of final DS composition and model predictive performance.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

Abbreviations

QSAR:

quantitative structure activity relationship

descriptor:

attribute

molecule:

object

input variable:

independent variable

output variable:

dependent variable

m–n–p:

fully connected NN topology with biases where input layer contains m input neurons hidden layer contains n hidden neurons and output layer contains p neurons

ENN:

evolutionary neural networks

GNN:

genetic neural networks

NN:

neural networks

DS:

descriptor subsets

LOO:

leave one out

EA:

evolutionary algorithm

CPU:

central processing unit

PC:

personal computer

FF:

fitness function, also called objective or merit function

MSE:

mean squared error

SSE:

sum of squared errors

RMSE:

root mean squared errors

CV:

coefficient of variation (relative standard deviation)

LMO:

leave many out

ROC:

receiver operating characteristic

SCG:

scaled conjugated gradient

SNNS:

Stuttgart neural network simulator

GABA:

γ-aminobutiric acid

IC:

inhibitory concentration

X〉:

average X value

ntot :

total number of molecules

npart :

number of complete dataset partitions into training and external validation set

ntest :

number of external validation set molecules

nsubd :

number of internal validation sets

T :

critical npart fraction corresponding to number of complete dataset partitions for which specific DS fails to pass at least one of the predictive performance filters

p(α):

probability of type I statistical error

p(β):

probability of type II statistical error

References

  1. Guyon I., Elisseef A., (2003) J. Mach. Learn. Res. 3:1157

    Article  Google Scholar 

  2. Molina, L.C., Belanche, L. and Nebot, A., 2002 IEEE International Conference on Data Mining, Maebashi City, Japan, September 09–12, 2002

  3. So S.S., Karplus M., (1996) J. Med. Chem. 39:1521

    Article  PubMed  CAS  Google Scholar 

  4. So S.S., Karplus M., (1996) J. Med. Chem. 39:5246

    Article  PubMed  CAS  Google Scholar 

  5. So S.S., Karplus M., (1997) J. Med. Chem. 40:4347

    Article  PubMed  CAS  Google Scholar 

  6. So S.S., Karplus M., (1997) J. Med. Chem. 40:4360

    Article  PubMed  CAS  Google Scholar 

  7. So S.S., van Helden S.P., van Geerestein V.J., Karplus M., (2000) J. Chem. Inf. Comput. Sci. 40:762

    Article  PubMed  CAS  Google Scholar 

  8. Kyngäs J., Valjakka, (1996) J. Quant. Struct.-Act. Relat. 15:296

    Article  Google Scholar 

  9. Patankar S.J., Jurs P.C., (2000) J. Chem. Inf. Comput. Sci. 40:706

    Article  PubMed  CAS  Google Scholar 

  10. Patankar S.J., Jurs P.C., (2002) J. Chem. Inf. Comput. Sci. 42:1053

    Article  PubMed  CAS  Google Scholar 

  11. Mattioni B.E., Kauffman G.W., Jurs P.C., Custer L.L., Durham S.K., Pearl G.M., (2003) J. Chem. Inf. Comput. Sci. 43:949

    Article  PubMed  CAS  Google Scholar 

  12. Hemmateenejad B., Akhond M., Miri R., Shamsipur M., (2003) J. Chem. Inf. Comput. Sci. 43:1328

    Article  PubMed  CAS  Google Scholar 

  13. Kohavi, R., A study of cross-validation and bootstrap for accuracy estimation and model selection, Technical Report. Computer Science Department, Stanford University, Stanford, 1995

  14. Lunneborg, C.E., Data Analysis by Resampling: Concepts and Applications. Duxbury Press, Pacific Grove, 2000

  15. Breiman L., (2001) Statist. Sci. 16:199

    Article  MathSciNet  Google Scholar 

  16. Breiman, L., Bagging predictors, Technical Report. Department of Statistics, University of California, Berkley, 1994

  17. Breiman L., (2001) Mach. Learn. 45:5

    Article  Google Scholar 

  18. Tetko I.V., Livingstone D.J., Luik A.I., (1995) J. Chem. Inf. Comput. Sci. 35:826

    Article  CAS  Google Scholar 

  19. Yao X., Liu Y., (1998) IEEE Trans. Syst. Man. Cybern. B Cybern. 28:417

    Article  Google Scholar 

  20. Leardi R., Lupianez Gonzalez A., (1998) Chemometr. Intell. Lab. Syst. 41:195

    Article  CAS  Google Scholar 

  21. Baumann K., (2003) Trends Anal. Chem. 22:395

    Article  CAS  Google Scholar 

  22. Kohavi, R. and Sommerfeld, D., Feature subset selection using wrapper method: overfitting and dynamic search space topology, Technical Report. Computer Science Department, Stanford University, Stanford, 1995

  23. Baumann K., Albert H., von Korff M., (2002) J. Chemomet. 16:339

    Article  CAS  Google Scholar 

  24. Baumann K., von Korff M., Albert H., (2002) J. Chemomet. 16:351

    Article  CAS  Google Scholar 

  25. Golbraikh A., Tropsha A., (2002) J. Mol. Graphics Model. 20:269

    Article  CAS  Google Scholar 

  26. Tropsha A., Gramatica P., Gombar V.K., (2003) QSAR Comb. Sci. 22:69

    Article  CAS  Google Scholar 

  27. Golbraikh A., Shen M., Xiao Z., Xiao Y.-D., Lee K.-H., Tropsha A., (2003) J. Comput.-Aided Mol. Des. 17:241

    Article  PubMed  CAS  Google Scholar 

  28. Hawkins D.M., Basak S.C., Mills D., (2003) J. Chem. Inf. Comput. Sci. 43:579

    Article  PubMed  CAS  Google Scholar 

  29. Todeschini R., Consonni V., Mauri A., Pavan M., (2004) Anal. Chim. Acta 515:199

    Article  CAS  Google Scholar 

  30. Clark R.D., (2003) J. Comput.-Aided Mol. Des. 17:265

    Article  PubMed  CAS  Google Scholar 

  31. Shen Q., Jiang J.-H., Shen G.-L., Yu R.-Q., (2003) Anal. Bioanal. Chem. 375:248

    PubMed  CAS  Google Scholar 

  32. Maddalena D.J., Johnston G.A.R., (1995) J. Med. Chem. 38:715

    Article  PubMed  CAS  Google Scholar 

  33. Aoyama T., Suzuki Y., Ichikawa H., (1990) J. Med. Chem. 33:2583

    Article  PubMed  CAS  Google Scholar 

  34. Abraham, A., Optimization of Evolutionary Neural Networks Using Hybrid Learning Algorithms, Technical Report, School of Business Systems, Monash University, 2002

  35. Zell, A. (Ed.), Stuttgart Neural Network Simulator User Manual, Version 4.2. University of Stuttgart and University of Tübingen, 1998

  36. Haykin S., 1999 Neural Networks: A Comprehensive Foundation. (2), Prentice-Hall Inc.: Upper Saddle River, NJ

    Google Scholar 

  37. Gasteiger J., Zupan J., (1993) Angew. Chem. Int. Ed. Engl. 32:503

    Article  Google Scholar 

  38. Møller M.F., (1993) Neural Networks 6:525

    Article  Google Scholar 

  39. Chiu T.-L., So S.S., (2003) QSAR Comb. Sci. 22:519

    Article  CAS  Google Scholar 

  40. Wagener M., Sadowski J., Gasteiger J., (1995) J. Am. Chem. Soc. 117:7769

    Article  CAS  Google Scholar 

  41. Dudewitz E.J., Mishra S.N., 1988 Modern Mathematical Statistics, John Wiley and Sons, New York

    Google Scholar 

  42. Svetnik V., Liaw A., Tong C., Culberson J.C., Sheridan R.P., Feuston B.P., (2003) J. Chem. Inf. Comput. Sci. 43:1947

    Article  PubMed  CAS  Google Scholar 

  43. Kohavi R., John G.H., (1997) Artif. Intell. 97:273

    Article  Google Scholar 

Download references

Acknowledgement

This work was supported by the Ministry of Science and Technology of the Republic of Croatia through Grant 0006541.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Željko Debeljak.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Debeljak, Ž., Marohnić, V., Srečnik, G. et al. Novel approach to evolutionary neural network based descriptor selection and QSAR model development. J Comput Aided Mol Des 19, 835–855 (2005). https://doi.org/10.1007/s10822-005-9022-2

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10822-005-9022-2

Keywords

Navigation