Skip to main content

Random Forest Model and Sample Explainer for Non-experts in Machine Learning – Two Case Studies

  • Conference paper
  • First Online:

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 12663))

Abstract

Machine Learning (ML) is becoming an increasingly critical technology in many areas such as health, business but also in everyday applications of significant societal importance. However, the lack of explainability or ability of ML systems to offer explanation on how they work, which refers to the model (related to the whole data) and sample explainability (related to specific samples) poses significant challenges in their adoption, verification, and in ensuring the trust among users and general public. We present novel integrated Random Forest Model and Sample Explainer – RFEX. RFEX is specifically designed for important class of users who are non-ML experts but are often the domain experts and key decision makers. RFEX provides easy to analyze one-page Model and Sample explainability summaries in tabular format with wealth of explainability information including classification confidence, tradeoff between accuracy and features used, as well as ability to identify potential outlier samples and features. We demonstrate RFEX on two case studies: mortality prediction for COVID-19 patients from the data obtained from Huazhong University of Science and Technology, Wuhan, China, and classification of cell type clusters for human nervous system based on the data from J. Craig Venter Institute. We show that RFEX offers simple yet powerful means of explaining RF classification at model, sample and feature levels, as well as providing guidance for testing and developing explainable and cost-effective operational prediction models.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

  1. Szabo, L., Kaiser Health News: Artificial intelligence is rushing into patient care—and could raise risks. Sci. Am. 24 December 2019

    Google Scholar 

  2. Kaufman, S., Rosset, S., Perlich, C.: Leakage in data mining: formulation, detection, and avoidance. ACM Trans. Knowl. Discov. Data 6(4), 1–21 (2012)

    Article  Google Scholar 

  3. Dzindolet, M., Peterson, S., Pomranky, R., Pierce, L., Beck, H.: The role of trust in automation reliance. Int. J. Hum.-Comput. Stud. 58(6), 697–718 (2003)

    Article  Google Scholar 

  4. Holm, E.: In defense of black box. Science 364(6435), 26–27 (2019)

    Google Scholar 

  5. Petkovic, D., Kobzik, L., Re, C.: Machine learning and deep analytics for biocomputing: call for better explainability. Pacific Symposium on Biocomputing Hawaii 23, 623–627 (2018)

    Google Scholar 

  6. Petkovic, D., Kobzik, L., Ganaghan, R.: AI ethics and values in biomedicine – technical challenges and solutions. In: Pacific Symposium on Biocomputing, Hawaii, 3–7 January (2020)

    Google Scholar 

  7. Vellido, A., Martin-Guerrero, J., Lisboa, P.: Making machine learning models interpretable. European Symposium on Artificial Neural Networks, Computational Intelligence and Machine Learning; 25–27 April, Bruges, Belgium (2012)

    Google Scholar 

  8. Future of Life Institute: Asilomar AI Priciples. https://futureoflife.org/ai-principles/?cn-reloaded=1. Accessed 09 2020

  9. Asociation of Computing machinery: Statement on Algorithmic Transparency and Accountability, 01 Dec 2017. https://www.acm.org/binaries/content/assets/public-policy/2017_usacm_statement_algorithms.pdf

  10. OECD Principles on AI. https://www.oecd.org/going-digital/ai/principles/ Accessed 09 2020

  11. Breiman, L.: Random forests. Mach. Learn. 45(1), 5–32 (2001)

    Article  Google Scholar 

  12. Petkovic, D., Altman, R., Wong, M., Vigil, A.: Improving the explainability of Random Forest classifier - user centered approach. Pacific Symposium on Biocomputing. 23, 204–215 (2018)

    Google Scholar 

  13. L. Buturovic, M. Wong, G. Tang, R. Altman, D. Petkovic: “High precision prediction of functional sites in protein structures”, PLoS ONE 9(3): e91240. https://doi.org/10.1371/journal.pone.0091240

  14. Okada, K., Flores, L., Wong, M., Petkovic, D.: Microenvironment-based protein function analysis by random forest. In: Proceedings of the ICPR (International Conference on Pattern Recognition), Stockholm (2014)

    Google Scholar 

  15. Yan, L., et al.: An Interpretable mortality prediction model for COVID-19 patients. Nature Mach. Intell. 2, pp. 283–288 (2020)

    Google Scholar 

  16. Aevermann, B., et al.: Cell type discovery using single cell transcriptomics: implications for ontological representation. Hum. Mol. Gene. 27(R1), R40–R47 (2018)

    Google Scholar 

  17. Aevermann, B., McCorrison, J., Venepally, P., et al.: Production of a preliminary quality control pipeline for single nuclei RNA-seq and its application in the analysis of cell type diversity of post-mortem human brain neocortex. In: Pacific Symposium on Biocomputing Proceedings, vol. 22, pp. 564–575, Hawaii, January 2017

    Google Scholar 

  18. Boldog, E., et al.: Transcriptomic and morphophysiological evidence for a specialized human cortical GABAergic cell type. Nat. Neurosci. 2018 21(9), 1185–1195. https://doi.org/10.1038/s41593-018-0205-2. Epub 2018 Aug 27

  19. Yang, J., Petkovic, D.: Application of Improved Random Forest Explainability (Rfex 2.0) on Data from JCV Institute LaJolla, California, SFSU CS Department TR 19.01, 16 June 2019. https://cs.sfsu.edu/sites/default/files/technical-reports/RFEX%202%20JCVI_Jizhou%20Petkovic%20%2006-16-19_0.pdf

  20. Alavi, A., Petkovic, D.: Improvements of Explainability of Random Forest Algorithms. SFSU CS Department TR TR 20.01, May 2020. https://cs.sfsu.edu/sites/default/files/technical-reports/Ali%20Alavi%20CER%20895%20RFEX%20May%202020.pdf

  21. Olson, R.S., Cava, W., Mustahsan, Z., Varik, A., Moore, J.H.: Data-driven advice for applying machine learning to bioinformatics problems. Pac. Symp. Biocomput. 23, 192–203 (2018)

    Google Scholar 

  22. Liaw, A., Wiener, M.: Classification and regression by random forest. R News 2(3), 18–22 (2002). http://CRAN.R-project.org/doc/Rnews/

  23. Solla, F., Tran, A., Bertoncelli, D., Musoff, C., Bertoncelli, C.M.: Why a P-value is not enough. Clin Spine Surg. 31(9), 385–388 (2018)

    Article  Google Scholar 

  24. Barlaskar, S., Petkovic, D.: Applying Improved Random Forest Explainability (RFEX 2.0) on synthetic data. SFSU TR 18.01, 11/27/20181; with related toolkit at https://www.youtube.com/watch?v=neSVxbxxiCE

Download references

Acknowledgment

We are grateful to researchers from Huazhong University of Science and Technology, Wuhan, China for their prompt response to our inquiry for the COVID-19 data, and Dr. R. Scheuermann and B. Aevermann from JCVI for the data for our case study and their feedback. We are also grateful to Prof. Russ Altman, Stanford University, and Prof. Lester Kobzik (Harvard University) for their feedback and encouragement.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to D. Petkovic .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2021 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Petkovic, D., Alavi, A., Cai, D., Wong, M. (2021). Random Forest Model and Sample Explainer for Non-experts in Machine Learning – Two Case Studies. In: Del Bimbo, A., et al. Pattern Recognition. ICPR International Workshops and Challenges. ICPR 2021. Lecture Notes in Computer Science(), vol 12663. Springer, Cham. https://doi.org/10.1007/978-3-030-68796-0_5

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-68796-0_5

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-68795-3

  • Online ISBN: 978-3-030-68796-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics