Skip to main content
Log in

Comparing classification models—a practical tutorial

  • Published:
Journal of Computer-Aided Molecular Design Aims and scope Submit manuscript

Abstract

While machine learning models have become a mainstay in Cheminformatics, the field has yet to agree on standards for model evaluation and comparison. In many cases, authors compare methods by performing multiple folds of cross-validation and reporting the mean value for an evaluation metric such as the area under the receiver operating characteristic. These comparisons of mean values often lack statistical rigor and can lead to inaccurate conclusions. In the interest of encouraging best practices, this tutorial provides an example of how multiple methods can be compared in a statistically rigorous fashion.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7

Similar content being viewed by others

Data availability

All data is available in a GitHub repository.

Code availability

All code is available in a GitHub repository.

References

  1. Walters WP, Barzilay R (2021) Critical assessment of AI in drug discovery. Expert Opin Drug Discov. https://doi.org/10.1080/17460441.2021.1915982

    Article  PubMed  Google Scholar 

  2. Elton DC, Boukouvalas Z, Fuge MD, Chung PW (2019) Deep learning for molecular design—a review of the state of the art. Mol Syst Des Eng 4:828–849

    Article  CAS  Google Scholar 

  3. Bender A, Cortés-Ciriano I (2021) Artificial intelligence in drug discovery: what is realistic, what are illusions? Part 1: ways to make an impact, and why we are not there yet. Drug Discov Today 26:511–524

    Article  CAS  Google Scholar 

  4. Bender A, Cortes-Ciriano I (2021) Artificial intelligence in drug discovery: what is realistic, what are illusions? Part 2: a discussion of chemical and biological data. Drug Discov Today. https://doi.org/10.1016/j.drudis.2020.11.037

    Article  PubMed  PubMed Central  Google Scholar 

  5. Vamathevan J, Clark D, Czodrowski P et al (2019) Applications of machine learning in drug discovery and development. Nat Rev Drug Discov. https://doi.org/10.1038/s41573-019-0024-5

    Article  PubMed  PubMed Central  Google Scholar 

  6. Nicholls A (2011) What do we know?: simple statistical techniques that help. Methods Mol Biol 672:531–581

    Article  CAS  Google Scholar 

  7. Jain AN, Nicholls A (2008) Recommendations for evaluation of computational methods. J Comput Aided Mol Des 22:133–139

    Article  CAS  Google Scholar 

  8. Nicholls A (2014) Confidence limits, error bars and method comparison in molecular modeling. Part 1: the calculation of confidence intervals. J Comput Aided Mol Des 28:887–918

    Article  CAS  Google Scholar 

  9. Nicholls A (2016) Confidence limits, error bars and method comparison in molecular modeling. Part 2: comparing methods. J Comput Aided Mol Des 30:103–126

    Article  CAS  Google Scholar 

  10. Jamieson C, Moir EM, Rankovic Z, Wishart G (2008) Strategy and tactics for hERG optimizations. Antitargets. Wiley, Hoboken, pp 423–455

    Chapter  Google Scholar 

  11. Gaulton A, Bellis LJ, Bento AP et al (2011) ChEMBL: a large-scale bioactivity database for drug discovery. Nucleic Acids Res 40:D1100–D1107

    Article  Google Scholar 

  12. Bento AP, Gaulton A, Hersey A et al (2013) The ChEMBL bioactivity database: an update. Nucleic Acids Res 42:D1083–D1090

    Article  Google Scholar 

  13. jcamd_model_comparison. Available at https://github.com/PatWalters/jcamd_model_comparison

  14. Czodrowski P (2013) hERG me out. J Chem Inf Model 53:2240–2251

    Article  CAS  Google Scholar 

  15. McKinney W (2017) Python for data analysis: data wrangling with pandas, NumPy, and IPython. O’Reilly Media, Incorporated, Sebastopol

    Google Scholar 

  16. Esposito C, Landrum GA, Schneider N et al (2021) GHOST: adjusting the decision threshold to handle imbalanced data in machine learning. J Chem Inf Model. https://doi.org/10.1021/acs.jcim.1c00160

    Article  PubMed  Google Scholar 

  17. Cáceres EL, Mew NC, Keiser MJ (2020) Adding stochastic negative examples into machine learning improves molecular bioactivity prediction. J Chem Inf Model. https://doi.org/10.1021/acs.jcim.0c00565

    Article  PubMed  Google Scholar 

  18. Lopez-Del Rio A, Picart-Armada S, Perera-Lluna A (2021) Balancing data on deep learning-based proteochemometric activity classification. J Chem Inf Model 61:1657–1669

    Article  CAS  Google Scholar 

  19. Svetnik V, Liaw A, Tong C et al (2003) Random forest: a classification and regression tool for compound classification and QSAR modeling. J Chem Inf Comput Sci 43:1947–1958

    Article  CAS  Google Scholar 

  20. Sheridan RP, Liaw A, Tudor M (2021) Light gradient boosting machine as a regression method for quantitative structure-activity relationships. arXiv [q-bio.BM]

  21. RDKit: open-source cheminformatics software. Available at https://github.com/rdkit/rdkit. Accessed 28 Feb 2021

  22. Rogers D, Hahn M (2010) Extended-connectivity fingerprints. J Chem Inf Model 50:742–754

    Article  CAS  Google Scholar 

  23. Truchon J-F, Bayly CI (2007) Evaluating virtual screening methods: good and bad metrics for the “early recognition” problem. J Chem Inf Model 47:488–508

    Article  CAS  Google Scholar 

  24. Nicholls A (2008) What do we know and when do we know it? J Comput Aided Mol Des 22:239–255

    Article  CAS  Google Scholar 

  25. Dietterich TG (1998) Approximate statistical tests for comparing supervised classification learning algorithms. Neural Comput 10:1895–1923

    Article  CAS  Google Scholar 

  26. Mlxtend. Available at http://rasbt.github.io/mlxtend/

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to W. Patrick Walters.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

For the special issue in honor of Gerry Maggiora.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Patrick Walters, W. Comparing classification models—a practical tutorial. J Comput Aided Mol Des 36, 381–389 (2022). https://doi.org/10.1007/s10822-021-00417-2

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10822-021-00417-2

Keywords

Navigation