Random Forest Model and Sample Explainer for Non-experts in Machine Learning – Two Case Studies

Petkovic, D.; Alavi, A.; Cai, D.; Wong, M.

doi:10.1007/978-3-030-68796-0_5

Random Forest Model and Sample Explainer for Non-experts in Machine Learning – Two Case Studies

D. Petkovic¹⁶,
A. Alavi¹⁶,
D. Cai¹⁶ &
…
M. Wong¹⁶

Conference paper
First Online: 21 February 2021

2633 Accesses
2 Citations

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 12663))

Abstract

Machine Learning (ML) is becoming an increasingly critical technology in many areas such as health, business but also in everyday applications of significant societal importance. However, the lack of explainability or ability of ML systems to offer explanation on how they work, which refers to the model (related to the whole data) and sample explainability (related to specific samples) poses significant challenges in their adoption, verification, and in ensuring the trust among users and general public. We present novel integrated Random Forest Model and Sample Explainer – RFEX. RFEX is specifically designed for important class of users who are non-ML experts but are often the domain experts and key decision makers. RFEX provides easy to analyze one-page Model and Sample explainability summaries in tabular format with wealth of explainability information including classification confidence, tradeoff between accuracy and features used, as well as ability to identify potential outlier samples and features. We demonstrate RFEX on two case studies: mortality prediction for COVID-19 patients from the data obtained from Huazhong University of Science and Technology, Wuhan, China, and classification of cell type clusters for human nervous system based on the data from J. Craig Venter Institute. We show that RFEX offers simple yet powerful means of explaining RF classification at model, sample and feature levels, as well as providing guidance for testing and developing explainable and cost-effective operational prediction models.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

Szabo, L., Kaiser Health News: Artificial intelligence is rushing into patient care—and could raise risks. Sci. Am. 24 December 2019
Google Scholar
Kaufman, S., Rosset, S., Perlich, C.: Leakage in data mining: formulation, detection, and avoidance. ACM Trans. Knowl. Discov. Data 6(4), 1–21 (2012)
Article Google Scholar
Dzindolet, M., Peterson, S., Pomranky, R., Pierce, L., Beck, H.: The role of trust in automation reliance. Int. J. Hum.-Comput. Stud. 58(6), 697–718 (2003)
Article Google Scholar
Holm, E.: In defense of black box. Science 364(6435), 26–27 (2019)
Google Scholar
Petkovic, D., Kobzik, L., Re, C.: Machine learning and deep analytics for biocomputing: call for better explainability. Pacific Symposium on Biocomputing Hawaii 23, 623–627 (2018)
Google Scholar
Petkovic, D., Kobzik, L., Ganaghan, R.: AI ethics and values in biomedicine – technical challenges and solutions. In: Pacific Symposium on Biocomputing, Hawaii, 3–7 January (2020)
Google Scholar
Vellido, A., Martin-Guerrero, J., Lisboa, P.: Making machine learning models interpretable. European Symposium on Artificial Neural Networks, Computational Intelligence and Machine Learning; 25–27 April, Bruges, Belgium (2012)
Google Scholar
Future of Life Institute: Asilomar AI Priciples. https://futureoflife.org/ai-principles/?cn-reloaded=1. Accessed 09 2020
Asociation of Computing machinery: Statement on Algorithmic Transparency and Accountability, 01 Dec 2017. https://www.acm.org/binaries/content/assets/public-policy/2017_usacm_statement_algorithms.pdf
OECD Principles on AI. https://www.oecd.org/going-digital/ai/principles/ Accessed 09 2020
Breiman, L.: Random forests. Mach. Learn. 45(1), 5–32 (2001)
Article Google Scholar
Petkovic, D., Altman, R., Wong, M., Vigil, A.: Improving the explainability of Random Forest classifier - user centered approach. Pacific Symposium on Biocomputing. 23, 204–215 (2018)
Google Scholar
L. Buturovic, M. Wong, G. Tang, R. Altman, D. Petkovic: “High precision prediction of functional sites in protein structures”, PLoS ONE 9(3): e91240. https://doi.org/10.1371/journal.pone.0091240
Okada, K., Flores, L., Wong, M., Petkovic, D.: Microenvironment-based protein function analysis by random forest. In: Proceedings of the ICPR (International Conference on Pattern Recognition), Stockholm (2014)
Google Scholar
Yan, L., et al.: An Interpretable mortality prediction model for COVID-19 patients. Nature Mach. Intell. 2, pp. 283–288 (2020)
Google Scholar
Aevermann, B., et al.: Cell type discovery using single cell transcriptomics: implications for ontological representation. Hum. Mol. Gene. 27(R1), R40–R47 (2018)
Google Scholar
Aevermann, B., McCorrison, J., Venepally, P., et al.: Production of a preliminary quality control pipeline for single nuclei RNA-seq and its application in the analysis of cell type diversity of post-mortem human brain neocortex. In: Pacific Symposium on Biocomputing Proceedings, vol. 22, pp. 564–575, Hawaii, January 2017
Google Scholar
Boldog, E., et al.: Transcriptomic and morphophysiological evidence for a specialized human cortical GABAergic cell type. Nat. Neurosci. 2018 21(9), 1185–1195. https://doi.org/10.1038/s41593-018-0205-2. Epub 2018 Aug 27
Yang, J., Petkovic, D.: Application of Improved Random Forest Explainability (Rfex 2.0) on Data from JCV Institute LaJolla, California, SFSU CS Department TR 19.01, 16 June 2019. https://cs.sfsu.edu/sites/default/files/technical-reports/RFEX%202%20JCVI_Jizhou%20Petkovic%20%2006-16-19_0.pdf
Alavi, A., Petkovic, D.: Improvements of Explainability of Random Forest Algorithms. SFSU CS Department TR TR 20.01, May 2020. https://cs.sfsu.edu/sites/default/files/technical-reports/Ali%20Alavi%20CER%20895%20RFEX%20May%202020.pdf
Olson, R.S., Cava, W., Mustahsan, Z., Varik, A., Moore, J.H.: Data-driven advice for applying machine learning to bioinformatics problems. Pac. Symp. Biocomput. 23, 192–203 (2018)
Google Scholar
Liaw, A., Wiener, M.: Classification and regression by random forest. R News 2(3), 18–22 (2002). http://CRAN.R-project.org/doc/Rnews/
Solla, F., Tran, A., Bertoncelli, D., Musoff, C., Bertoncelli, C.M.: Why a P-value is not enough. Clin Spine Surg. 31(9), 385–388 (2018)
Article Google Scholar
Barlaskar, S., Petkovic, D.: Applying Improved Random Forest Explainability (RFEX 2.0) on synthetic data. SFSU TR 18.01, 11/27/20181; with related toolkit at https://www.youtube.com/watch?v=neSVxbxxiCE

Download references

Acknowledgment

We are grateful to researchers from Huazhong University of Science and Technology, Wuhan, China for their prompt response to our inquiry for the COVID-19 data, and Dr. R. Scheuermann and B. Aevermann from JCVI for the data for our case study and their feedback. We are also grateful to Prof. Russ Altman, Stanford University, and Prof. Lester Kobzik (Harvard University) for their feedback and encouragement.

Author information

Authors and Affiliations

CS Department, San Francisco State University, San Francisco, CA, USA
D. Petkovic, A. Alavi, D. Cai & M. Wong

Authors

D. Petkovic
View author publications
You can also search for this author in PubMed Google Scholar
A. Alavi
View author publications
You can also search for this author in PubMed Google Scholar
D. Cai
View author publications
You can also search for this author in PubMed Google Scholar
M. Wong
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to D. Petkovic .

Editor information

Editors and Affiliations

Dipartimento di Ingegneria dell’Informazione, University of Firenze, Firenze, Italy
Alberto Del Bimbo
Dipartimento di Ingegneria “Enzo Ferrari”, Università di Modena e Reggio Emilia, Modena, Italy
Rita Cucchiara
Department of Computer Science, Boston University, Boston, MA, USA
Stan Sclaroff
Dipartimento di Matematica e Informatica, University of Catania, Catania, Italy
Giovanni Maria Farinella
Cloud & AI, JD.COM, Beijing, China
Tao Mei
Dipartimento di Ingegneria dell’Informazione, University of Firenze, Firenze, Italy
Marco Bertini
Computational Sciences Department, National Institute of Astrophysics, Optics and Electronics (INAOE), Tonantzintla, Puebla, Mexico
Hugo Jair Escalante
Dipartimento di Ingegneria “Enzo Ferrari”, Università di Modena e Reggio Emilia, Modena, Italy
Roberto Vezzani

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Petkovic, D., Alavi, A., Cai, D., Wong, M. (2021). Random Forest Model and Sample Explainer for Non-experts in Machine Learning – Two Case Studies. In: Del Bimbo, A., et al. Pattern Recognition. ICPR International Workshops and Challenges. ICPR 2021. Lecture Notes in Computer Science(), vol 12663. Springer, Cham. https://doi.org/10.1007/978-3-030-68796-0_5

Download citation

DOI: https://doi.org/10.1007/978-3-030-68796-0_5
Published: 21 February 2021
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-68795-3
Online ISBN: 978-3-030-68796-0
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Societies and partnerships

The International Association for Pattern Recognition (opens in a new tab)