Skip to main content
Log in

Rethinking the applicability domain analysis in QSAR models

  • Published:
Journal of Computer-Aided Molecular Design Aims and scope Submit manuscript

Abstract

Notwithstanding the wide adoption of the OECD principles (or best practices) for QSAR modeling, disparities between in silico predictions and experimental results are frequent, suggesting that model predictions are often too optimistic. Of these OECD principles, the applicability domain (AD) estimation has been recognized in several reports in the literature to be one of the most challenging, implying that the actual reliability measures of model predictions are often unreliable. Applying tree-based error analysis workflows on 5 QSAR models reported in the literature and available in the QsarDB repository, i.e., androgen receptor bioactivity (agonists, antagonists, and binders, respectively) and membrane permeability (highest membrane permeability and the intrinsic permeability), we demonstrate that predictions erroneously tagged as reliable (AD prediction errors) overwhelmingly correspond to instances in subspaces (cohorts) with the highest prediction error rates, highlighting the inhomogeneity of the AD space. In this sense, we call for more stringent AD analysis guidelines which require the incorporation of model error analysis schemes, to provide critical insight on the reliability of underlying AD algorithms. Additionally, any selected AD method should be rigorously validated to demonstrate its suitability for the model space over which it is applied. These steps will ultimately contribute to more accurate estimations of the reliability of model predictions. Finally, error analysis may also be useful in “rational” model refinement in that data expansion efforts and model retraining are focused on cohorts with the highest error rates.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3

Similar content being viewed by others

Supporting availability of data and materials

Datasets and the python-based error analysis implementation employed in this work are available at: https://github.com/sjbarigye/erroranalysis. Original datasets, models and performance metrics may be accessed via the public QsarDB repository at : https://doi.org/10.15152/QDB.236, https://doi.org/10.15152/QDB.206.

References

  1. Cherkasov A, Muratov EN, Fourches D, Varnek A, Baskin II, Cronin M, Dearden J, Gramatica P, Martin YC, Todeschini R (2014) J Med Chem 57(12):4977

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  2. Neves BJ, Braga RC, Melo-Filho CC, Moreira-Filho JT, Muratov EN, Andrade CH (2018) Front pharmacol 9

  3. Sheridan RP (2013) J Chem Inf Model 53(4):783

    Article  CAS  PubMed  Google Scholar 

  4. Muratov EN, Bajorath J, Sheridan RP, Tetko IV, Filimonov D, Poroikov V, Oprea TI, Baskin II, Varnek A, Roitberg A (2020) Chem Soc Rev 49(11):3525

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  5. Tropsha A (2010) Mol Inf 29(6–7):476

    Article  CAS  Google Scholar 

  6. Mathea M, Klingspohn W, Baumann K (2016) Mol Inf 35(5):160

    Article  CAS  Google Scholar 

  7. Sahigara F, Mansouri K, Ballabio D, Mauri A, Consonni V, Todeschini R (2012) Molecules 17(5):4791

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  8. Tropsha A, Golbraikh A (2007) Curr Pharm Des 13(34):3494

    Article  CAS  PubMed  Google Scholar 

  9. Jaworska J, Nikolova-Jeliazkova N, Aldenberg T (2005) ATLA Altern Lab Anim 33(5):445

    Article  CAS  PubMed  Google Scholar 

  10. Tetko IV, Sushko I, Pandey AK, Zhu H, Tropsha A, Papa E, Oberg T, Todeschini R, Fourches D, Varnek A (2008) J Chem Inf Model 48(9):1733

    Article  CAS  PubMed  Google Scholar 

  11. Sheridan RP (2012) J Chem Inf Model 52(3):814

    Article  MathSciNet  CAS  PubMed  Google Scholar 

  12. Sheridan RP (2013) J Chem Inf Model 53(11):2837

    Article  CAS  PubMed  Google Scholar 

  13. Norinder U, Carlsson L, Boyer S, Eklund M (2014) J Chem Inf Model 54(6):1596

    Article  CAS  PubMed  Google Scholar 

  14. Bosc N, Atkinson F, Felix E, Gaulton A, Hersey A, Leach AR (2019) J Cheminformatics 11:1

    Article  Google Scholar 

  15. Cortes-Ciriano I, Murrell DS, van Westen GJ, Bender A, Malliavin TE (2015) J Cheminformatics 7(1):1

    Article  CAS  Google Scholar 

  16. Oršolić D, Šmuc T (2023) Bioinformatics 39(8):btad465

    Article  PubMed  PubMed Central  Google Scholar 

  17. Ruusmann V, Sild S, Maran U (2015) J Cheminformatics 7(1):32

    Article  CAS  Google Scholar 

  18. Oja M, Sild S, Maran U (2019) J Chem Inf Model 59(5):2442

    Article  CAS  PubMed  Google Scholar 

  19. Piir G, Sild S, Maran U (2021) Chemosphere 262:128313

    Article  ADS  CAS  PubMed  Google Scholar 

  20. Wolpert DH, Macready WG (1997) IEEE T Evolut Comput 1(1):67

    Article  Google Scholar 

  21. Sullivan K, Manuppello J, Willett C (2014) SAR QSAR Environ Res 25(5):357

    Article  CAS  PubMed  Google Scholar 

  22. Dearden JC, Rowe PH (2015) Use of artificial neural networks in the QSAR prediction of physicochemical properties and toxicities for REACH legislation. In: Cartwright H (ed) Artificial neural networks. Methods in Molecular Biology. Springer, New York, NY, p 65

    Chapter  Google Scholar 

  23. Pavan M, Worth A (2008) SAR QSAR Environ Res 19(7–8):785

    Article  CAS  PubMed  Google Scholar 

  24. Miller TH, Gallidabino MD, MacRae JI, Hogstrand C, Bury NR, Barron LP, Snape JR, Owen SF (2018) Environ Sci Technol 52(22):12953

    Article  ADS  CAS  PubMed  Google Scholar 

  25. Gouin T (2010) Environ Sci Policy 13(3):175

    Article  CAS  Google Scholar 

  26. Syberg K, Hansen SF (2016) Sci Total Environ 541:784

    Article  ADS  CAS  PubMed  Google Scholar 

  27. Scior T, Medina-Franco J, Do Q-T, Martínez-Mayorga K, Yunes Rojas J, Bernard P (2009) Curr Med Chem 16(32):4297

    Article  CAS  PubMed  Google Scholar 

  28. Martin YC (2012) Wiley Interdisciplinary Reviews. Comput Mol Sci 2(3):435

    Article  CAS  Google Scholar 

  29. Gini G (2018) QSAR: what else? Computational toxicology: methods and protocols, vol 1800. Humana, New York, NY, p 79

    Book  Google Scholar 

Download references

Funding

Not applicable.

Author information

Authors and Affiliations

Authors

Contributions

J.R.M: investigation, formal analysis, writing – original draft, visualization. E.A.M: methodology, validation, formal analysis. N.P.P: resources, conceptualization. E.C.T: software implementation & scripting, Y.P.C: investigation, data curation, methodology. G.A.C: software implementation, writing – review & editing, F.M.R: data curation, formal analysis. Y.M.P: methodology, writing – review & editing, project administration. S.J.B: conceptualization, formal analysis, methodology, writing - review & editing.

Corresponding author

Correspondence to Stephen J. Barigye.

Ethics declarations

Competing interests

The authors declare no competing interests.

Ethical approval

Not applicable.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Mora, J.R., Marquez, E.A., Pérez-Pérez, N. et al. Rethinking the applicability domain analysis in QSAR models. J Comput Aided Mol Des 38, 9 (2024). https://doi.org/10.1007/s10822-024-00550-8

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s10822-024-00550-8

Keywords

Navigation