Skip to main content

Application of Breiman’s Random Forest to Modeling Structure-Activity Relationships of Pharmaceutical Molecules

  • Conference paper

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 3077))

Abstract

Leo Breiman’s Random Forest ensemble learning procedure is applied to the problem of Quantitative Structure-Activity Relationship (QSAR) modeling for pharmaceutical molecules. This entails using a quantitative description of a compound’s molecular structure to predict that compound’s biological activity as measured in an in vitro assay. Without any parameter tuning, the performance of Random Forest with default settings on six publicly available data sets is already as good or better than that of three other prominent QSAR methods: Decision Tree, Partial Least Squares, and Support Vector Machine. In addition to reliable prediction accuracy, Random Forest provides variable importance measures which can be used in a variable reduction wrapper algorithm. Comparisons of various such wrappers and between Random Forest and Bagging are presented.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Ambroise, C., McLachlan, G.J.: Selection bias in gene extraction on the basis of microarray gene-expression data. Proc. Natl. Acad. Sci. USA 99, 6562–6566 (2002)

    Article  MATH  Google Scholar 

  2. Bakken, G.A., Jurs, P.C.: Classification of multidrug-resistance reversal agents using structure-based descriptors and linear discriminant analysis. J. Med. Chem. 43, 4534–4541 (2000)

    Article  Google Scholar 

  3. Breiman, L.: Arcing classifiers. Ann. Stat. 26, 801–849 (1998)

    Article  MATH  MathSciNet  Google Scholar 

  4. Breiman, L.: Random forests. Machine Learning 45, 5–32 (2001)

    Article  MATH  Google Scholar 

  5. Doniger, S., Hofmann, T., Yeh, J.: Predicting CNS permeability of drug molecules: comparison of neural network and support vector machine algorithms. J. Comput. Biol. 9, 849–864 (2002)

    Article  Google Scholar 

  6. Ekins, S., et al.: Progress in predicting human ADME parameters in silico. J. Pharmac. Toxic. Meth. 44, 251–272 (2000)

    Article  Google Scholar 

  7. Friedman, J.H., Popescu, B.E.: Importance sampled learning ensembles, http://www-stat.stanford.edu/~jhf/ftp/isle.pdf

  8. Gilligan, P.J., et al.: Novel piperidine σ receptor ligands as potential antipsychotic drugs. J. Med. Chem. 35, 4344–4361 (1992)

    Article  Google Scholar 

  9. Guyon, I., Weston, J., Barnhill, S., Vapnik, V.: Gene selection for cancer classification using support vector machines. Machine Learning 46, 389–422 (2002)

    Article  MATH  Google Scholar 

  10. Hastie, T., Tibshirani, R., Friedman, J.: The Elements of Statistical Learning: Data Mining, Inference, and Prediction. Springer, New York (2001)

    MATH  Google Scholar 

  11. Hawkins, D.M., Basak, S.C., Shi, X.: QSAR with few compounds and many features. J. Chem. Inf. Comput. Sci. 41, 663–670 (2001)

    Google Scholar 

  12. Kauffman, G.W., Jurs, P.C.: QSAR and k-nearest neighbor classification analysis of selective cyclooxygenase-2 inhibitors using topologically-based numerical descriptors. J. Chem. Inf. Comput. Sci. 41, 1553–1560 (2001)

    Google Scholar 

  13. Liaw, A., Wiener, M.: Classification and regression by randomForest. R News 2/3, 18–22 (2002)

    Google Scholar 

  14. Penzotti, J.E., Lamb, M.L., Evensen, E., Grootenhuis, P.D.J.: A computational ensemble pharmacophore model for identifying substrates of p-glycoprotein. J. Med. Chem. 45, 1737–1740 (2002)

    Article  Google Scholar 

  15. Reunanen, J.: Overfitting in making comparisons between variable selection methods. J. Machine Learning Res. 3, 1371–1382 (2003)

    Article  MATH  Google Scholar 

  16. Svetnik, V., Liaw, A., Tong, C., Culberson, J.C., Sheridan, R.P., Feuston, B.P.: QSAR modeling using Random Forest, an ensemble learning tool for regression and classification. J. Chem. Inf. Comput. Sci. 43, 1947–1958 (2003)

    Google Scholar 

  17. Tong, W., Hong, H., Fang, H., Xie, Q., Perkins, R.: Decision forest: combining the predictions of multiple independent decision tree models. J. Chem. Inf. Comput. Sci. 43, 525–531 (2003)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2004 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Svetnik, V., Liaw, A., Tong, C., Wang, T. (2004). Application of Breiman’s Random Forest to Modeling Structure-Activity Relationships of Pharmaceutical Molecules. In: Roli, F., Kittler, J., Windeatt, T. (eds) Multiple Classifier Systems. MCS 2004. Lecture Notes in Computer Science, vol 3077. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-25966-4_33

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-25966-4_33

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-22144-9

  • Online ISBN: 978-3-540-25966-4

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics