Skip to main content

Introducing the Rank-Biased Overlap as Similarity Measure for Feature Importance in Explainable Machine Learning: A Case Study on Parkinson’s Disease

  • Conference paper
  • First Online:
Book cover Brain Informatics (BI 2022)

Abstract

Feature importance is one of the most common explanations provided by Machine Learning (ML). However, different classification algorithms or different training sets could produce different rankings of predictive features. Thus, the quantification of differences between feature importance is crucial for assessing model trustworthiness. Rank-biased Overlap (RBO) is a similarity measure between incomplete, top-weighted and indefinite rankings, which are all characteristics of feature importance. In RBO, tuning persistence p allows to truncate rankings at any arbitrary depth, so to evaluate their overlapping size at increasing number of features. Classification of Parkinson’s disease (PD) with Explainable Boosting Machine (EBM) was chosen here as case study for introducing RBO in ML. An imbalanced dataset, 168 healthy controls (HC) and 396 PD patients, with 178 among clinical and imaging features was obtained from PPMI. Imbalanced, undersampled (K-Medoids) and oversampled (SMOTE) datasets were used for training EBMs, obtaining their respective feature importance. RBO score was calculated between ranking pairs incrementally increasing the depth by five features, from 1 to 178. All classifiers reached excellent AUC-ROC (~1) on test set, demonstrating the EBM prediction stability when trained on imbalanced datasets. RBO revealed that the maximum size of overlapping (80%) among rankings was obtained truncating at top 40 features, while their similarity decreased asymptotically to 50% when more than 45 features were considered. Thanks to RBO it was possible to demonstrate that, for the same accuracy, the more similar are the feature importance, the more stable is the model and the more reliable is the ML interpretability.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 64.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 84.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Molnar, C.: Interpretable machine learning. Lulu.com (2020)

    Google Scholar 

  2. Saarela, M., Jauhiainen, S.: Comparison of feature importance measures as explanations for classification models. SN Appl. Sci. 3(2), 1–12 (2021). https://doi.org/10.1007/s42452-021-04148-9

    Article  Google Scholar 

  3. Webber, W., Moffat, A., Zobel, J.: A similarity measure for indefinite rankings. ACM Trans. Inform. Syst. 28, 1–38 (2010)

    Article  Google Scholar 

  4. Sarica, A.: Editorial for the Special Issue on “Machine Learning in Healthcare and Biomedical Application”, MDPI, vol. 15, p. 97 (2022)

    Google Scholar 

  5. Dubey, R., Zhou, J., Wang, Y., Thompson, P.M., Ye, J.: Initiative AsDN: analysis of sampling techniques for imbalanced data: an n= 648 ADNI study. Neuroimage 87, 220–241 (2014)

    Article  Google Scholar 

  6. Kendall, M.G.: Rank correlation methods (1948)

    Google Scholar 

  7. Yilmaz, E., Aslam, J.A., Robertson, S.: A new rank correlation coefficient for information retrieval. In: Proceedings of the 31st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 587–594 (2008)

    Google Scholar 

  8. Bar-Ilan, J., Mat-Hassan, M., Levene, M.: Methods for comparing rankings of search engine results. Comput. Netw. 50, 1448–1463 (2006)

    Article  Google Scholar 

  9. Bar-Ilan, J.: Comparing rankings of search results on the web. Inf. Process. Manage. 41, 1511–1519 (2005)

    Article  Google Scholar 

  10. Caruana, R., Lou, Y., Gehrke, J., Koch, P., Sturm, M., Elhadad, N: Intelligible models for healthcare: predicting pneumonia risk and hospital 30-day readmission. In: Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 1721–1730 (2015)

    Google Scholar 

  11. Sarica, A., Quattrone, A., Quattrone, A.: Explainable boosting machine for predicting alzheimer’s disease from MRI hippocampal subfields. In: Mufti Mahmud, M., Kaiser, S., Vassanelli, S., Dai, Q., Zhong, N. (eds.) BI 2021. LNCS (LNAI), vol. 12960, pp. 341–350. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-86993-9_31

    Chapter  Google Scholar 

  12. Sarica, A., Quattrone, A., Quattrone, A.: Explainable machine learning with pairwise interactions for the classification of Parkinson’s disease and SWEDD from clinical and imaging features. Brain Imag. Behav. 1–11 (2022)

    Google Scholar 

  13. Park, H.-S., Jun, C.-H.: A simple and fast algorithm for K-medoids clustering. Expert Syst. Appl. 36, 3336–3341 (2009)

    Article  Google Scholar 

  14. Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artific. Intell. Res. 16, 321–357 (2002)

    Article  Google Scholar 

  15. Goetz, C.G., et al.: Movement disorder society URTF: movement disorder society-sponsored revision of the unified parkinson’s disease rating scale (MDS-UPDRS): scale presentation and clinimetric testing results. Mov. Disord. 23, 2129–2170 (2008)

    Article  Google Scholar 

  16. Hastie, T.J., Tibshirani, R.J.: Generalized Additive Models. CRC Press (1990)

    Google Scholar 

  17. Lou, Y., Caruana, R., Gehrke, J., Hooker, G.: Accurate intelligible models with pairwise interactions. In: Proceedings of the 19th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 623–631 (2013)

    Google Scholar 

  18. Nori, H., Jenkins, S., Koch, P., Caruana, R.: Interpretml: A unified framework for machine learning interpretability. arXiv preprint arXiv:190909223 (2019)

  19. Melucci, M.: Weighted rank correlation in information retrieval evaluation. In: Lee, G.G., et al. (eds.) Information Retrieval Technology, pp. 75–86. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-642-04769-5_7

  20. Lou, Y., Caruana, R., Gehrke, J.: Intelligible models for classification and regression. In: Proceedings of the 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 150–158 (2012)

    Google Scholar 

  21. Breiman, L.: Statistical modeling: the two cultures (with comments and a rejoinder by the author). Stat. Sci. 16, 199–231 (2001)

    Article  Google Scholar 

  22. Jollans, L., et al.: Quantifying performance of machine learning methods for neuroimaging data. Neuroimage 199, 351–365 (2019)

    Article  Google Scholar 

  23. Patil, A., Framewala, A., Kazi, F.: Explainability of smote based oversampling for imbalanced dataset problems. In: 2020 3rd International Conference on Information and Computer Technologies (ICICT), pp. 41–45. IEEE (2020)

    Google Scholar 

  24. Sarica, A., Cerasa, A., Quattrone, A.: Random forest algorithm for the classification of neuroimaging data in Alzheimer’s disease: a systematic review. Front. Aging Neurosci. 9, 329 (2017)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Alessia Sarica .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2022 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Sarica, A., Quattrone, A., Quattrone, A. (2022). Introducing the Rank-Biased Overlap as Similarity Measure for Feature Importance in Explainable Machine Learning: A Case Study on Parkinson’s Disease. In: Mahmud, M., He, J., Vassanelli, S., van Zundert, A., Zhong, N. (eds) Brain Informatics. BI 2022. Lecture Notes in Computer Science(), vol 13406. Springer, Cham. https://doi.org/10.1007/978-3-031-15037-1_11

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-15037-1_11

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-15036-4

  • Online ISBN: 978-3-031-15037-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics