Skip to main content

Simple Explanations to Summarise Subgroup Discovery Outcomes: A Case Study Concerning Patient Phenotyping

  • Conference paper
  • First Online:
Machine Learning and Principles and Practice of Knowledge Discovery in Databases (ECML PKDD 2022)

Abstract

Phenotyping is essential in medical research, as it provides a better understanding of healthcare problems owing to the fact that clinical phenotypes identify subsets of patients with common characteristics. Subgroup discovery (SD) appears to be a promising machine learning approach because it provides a framework with which to search for interesting subgroups according to the relations between the individual characteristics and a target value. Each single pattern extracted by SD algorithms is human-readable. However, its complexity (the number of attributes involved) and the high number of subgroups obtained make the overall model difficult to understand. In this work, we propose a method with which to explain SD, designed for the clinical context. We have employed a two-step process in order to obtain SD model-agnostic explanations based on a decision tree surrogate model. The complexity involved in evaluating explainable methods led us to adopt a multiple strategy. We first show how explanations are built, and test a selection of state-of-the-art SD algorithms and gold-standard datasets. We then illustrate the suitability of the method in a clinical use case for an antimicrobial resistance problem. Finally, we study the utility of the method by surveying a small group in order to validate it from a human-centric perspective.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 89.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 119.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    Available at PyPI, https://pypi.org/project/subgroups/.

  2. 2.

    Available at GitHub, https://github.com/Enrique-Val/SubgroupExplainer.

References

  1. Alcalá-Fdez, J., et al.: KEEL data-mining software tool: data set repository, integration of algorithms and experimental analysis framework. J. Multiple-Valued Logic Soft Comput. 17(2–3), 255–287 (2011)

    Google Scholar 

  2. Andrews, J.M.: Determination of minimum inhibitory concentrations. J. Antimicrobial Chemotherapy 48(Suppl. 1), 5–16 (2001)

    Google Scholar 

  3. Atzmueller, M.: Subgroup discovery. Wiley Interdiscipl. Rev. Data Min. Knowl. Discov. 5(1), 35–49 (2015)

    Article  Google Scholar 

  4. Atzmueller, M., Puppe, F.: SD-map – a fast algorithm for exhaustive subgroup discovery. In: Fürnkranz, J., Scheffer, T., Spiliopoulou, M. (eds.) PKDD 2006. LNCS (LNAI), vol. 4213, pp. 6–17. Springer, Heidelberg (2006). https://doi.org/10.1007/11871637_6

    Chapter  Google Scholar 

  5. Carmona, C.J., del Jesus, M.J., Herrera, F.: A unifying analysis for the supervised descriptive rule discovery via the weighted relative accuracy. Knowl.-Based Syst. 139, 89–100 (2018)

    Google Scholar 

  6. Craven, M., Shavlik, J.: Extracting tree-structured representations of trained networks. Adv. Neural. Inf. Process. Syst. 8, 24–30 (1995)

    Google Scholar 

  7. Dash, S., Gunluk, O., Wei, D.: Boolean decision rules via column generation. Adv. Neural. Inf. Process. Syst. 31, 4655–4665 (2018)

    Google Scholar 

  8. Di Castro, F., Bertini, E.: Surrogate decision tree visualization interpreting and visualizing black-box classification models with surrogate decision tree. In: Joint Proceedings of the ACM IUI 2019 Workshops Co-located with the 24th ACM Conference on Intelligent User Interfaces of CEUR Workshop Proceedings, vol. 2327. CEUR-WS (2019)

    Google Scholar 

  9. Gamberger, D., Lavrac, N.: Expert-guided subgroup discovery: methodology and application. J. Artif. Intell. Res. 17, 501–527 (2002)

    Article  MATH  Google Scholar 

  10. Gordon, A.D., Breiman, L., Friedman, J.H., Olshen, R.A., Stone, C.J.: Classification and regression trees. Biometrics 40(3), 874 (1984)

    Article  MATH  Google Scholar 

  11. Grosskreutz, H., Rüping, S., Wrobel, S.: Tight optimistic estimates for fast subgroup discovery. In: Daelemans, W., Goethals, B., Morik, K. (eds.) ECML PKDD 2008. LNCS (LNAI), vol. 5211, pp. 440–456. Springer, Heidelberg (2008). https://doi.org/10.1007/978-3-540-87479-9_47

    Chapter  Google Scholar 

  12. Helal, S.: Subgroup discovery algorithms: a survey and empirical evaluation. Knowl. Inf. Syst. 3(29), 495–525 (2011)

    Google Scholar 

  13. Ibrahim, L., Mesinovic, M., Yang, K.-W., Eid, M.A.: Explainable prediction of acute myocardial infarction using machine learning and Shapley values. IEEE Access 8, 210410–210417 (2020)

    Article  Google Scholar 

  14. Johnson, A.E.W., et al.: Mimic-iii, a freely accessible critical care database. Sci. Data 3(1), 1–9 (2016)

    Google Scholar 

  15. Klösgen, W.: Explora: a multipattern and multistrategy discovery assistant. In: Advances in Knowledge Discovery and Data Mining, pp. 249–271. AAAI/MIT Press (1996)

    Google Scholar 

  16. Lavrač, N., Flach, P., Zupan, B.: Rule evaluation measures: a unifying view. In: Džeroski, S., Flach, P. (eds.) ILP 1999. LNCS (LNAI), vol. 1634, pp. 174–185. Springer, Heidelberg (1999). https://doi.org/10.1007/3-540-48751-4_17

    Chapter  Google Scholar 

  17. Lavrac, N., Kavsek, B., Flach, P., Todorovski, L.: Subgroup discovery with CN2-SD. J. Mach. Learn. Res. 5(2), 153–188 (2004)

    Google Scholar 

  18. Lemmerich, F., Rohlfs, M., Atzmueller, M.: Fast discovery of relevant subgroup patterns. In: Proceedings of the Twenty-Third International Florida Artificial Intelligence Research Society Conference (FLAIRS), pp. 428–433. AAAI Press (2010)

    Google Scholar 

  19. Lonjarret, C., Robardet, C., Plantevit, M., Auburtin, R., Atzmueller, M.: Why should I trust this item? Explaining the recommendations of any model. In: 2020 IEEE 7th International Conference on Data Science and Advanced Analytics (DSAA), pp. 526–535 (2020)

    Google Scholar 

  20. Lundberg, S.M., et al.: From local explanations to global understanding with explainable AI for trees. Nat. Mach. Intell. 2(1), 56–67 (2020)

    Article  Google Scholar 

  21. Lundberg, S.M., Lee, S.-I.: A unified approach to interpreting model predictions. Adv. Neural Inf. Process. Syst. 30, 4765–4774 (2017)

    Google Scholar 

  22. Magesh, P.R., Myloth, R.D., Tom, R.J.: An explainable machine learning model for early detection of Parkinson’s disease using LIME on DaTSCAN imagery. Comput. Biol. Med. 126, 104041 (2020)

    Google Scholar 

  23. Miller, T.: Explanation in artificial intelligence: insights from the social sciences. Artif. Intell. 267, 1–38 (2019)

    Article  MATH  Google Scholar 

  24. Molnar, C.: Interpretable Machine Learning. Lulu.com (2019)

    Google Scholar 

  25. Mueller, M., Rosales, R., Steck, H., Krishnan, S., Rao, B., Kramer, S.: Subgroup discovery for test selection: a novel approach and its application to breast cancer diagnosis. In: Adams, N.M., Robardet, C., Siebes, A., Boulicaut, J.-F. (eds.) IDA 2009. LNCS, vol. 5772, pp. 119–130. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-642-03915-7_11

    Chapter  Google Scholar 

  26. Novak, P.K., Lavrač, N., Webb, G.I., Supervised descriptive rule discovery: a unifying survey of contrast set, emerging pattern and subgroup mining. J. Mach. Learn. Res. 10(2), 377–410 (2009)

    Google Scholar 

  27. Podgorelec, V., Kokol, P., Stiglic, B., Rozman, I.: Decision trees: an overview and their use in medicine. J. Med. Syst. 26(5), 445–463 (2002)

    Article  Google Scholar 

  28. Proença, H.M., Grünwald, P., Bäck, T., Leeuwen, M.: Discovering outstanding subgroup lists for numeric targets using MDL. In: Hutter, F., Kersting, K., Lijffijt, J., Valera, I. (eds.) ECML PKDD 2020. LNCS (LNAI), vol. 12457, pp. 19–35. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-67658-2_2

    Chapter  Google Scholar 

  29. Quinlan, J.R.: Induction of decision trees. Mach. Learn. 1(1), 81–106 (1986)

    Article  Google Scholar 

  30. Ribeiro, M.T., Singh, S., Guestrin, C.: Why should I trust you?: explaining the predictions of any classifier. In: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 1135–1144. ACM (2016)

    Google Scholar 

  31. van Leeuwen, M., Knobbe, A.: Non-redundant subgroup discovery in large and complex data. In: Gunopulos, D., Hofmann, T., Malerba, D., Vazirgiannis, M. (eds.) ECML PKDD 2011. LNCS (LNAI), vol. 6913, pp. 459–474. Springer, Heidelberg (2011). https://doi.org/10.1007/978-3-642-23808-6_30

    Chapter  Google Scholar 

  32. Vavpetič, A., Podpečan, V., Lavrač, N.: Semantic subgroup explanations. J. Intell. Inf. Syst. 42(2), 233–254 (2013)

    Article  Google Scholar 

  33. Vavpetič, A., Podpečan, V., Meganck, S., Lavrač, N.: Explaining subgroups through ontologies. In: Anthony, P., Ishizuka, M., Lukose, D. (eds.) PRICAI 2012. LNCS (LNAI), vol. 7458, pp. 625–636. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-32695-0_55

    Chapter  Google Scholar 

  34. Ventura, S., Luna, J.M., et al.: Supervised Descriptive Pattern Mining. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-98140-6

  35. Wei, D., Dash, S., Gao, T., Gunluk, O.: Generalized linear rule models. In: International Conference on Machine Learning, pp. 6687–6696. Proceedings of Machine Learning Research (2019)

    Google Scholar 

  36. Wrobel, S.: An algorithm for multi-relational discovery of subgroups. In: Komorowski, J., Zytkow, J. (eds.) PKDD 1997. LNCS, vol. 1263, pp. 78–87. Springer, Heidelberg (1997). https://doi.org/10.1007/3-540-63223-9_108

    Chapter  Google Scholar 

Download references

Acknowledgement

This work was partially funded by the CONFAINCE project (Ref: PID2021-122194OB-I00), supported by the Spanish Ministry of Science and Innovation, the Spanish Agency for Research and the IMPACT-T2D project (PMP21/00092) supported by the Spanish Health Institute Carlos III (ISCIII).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jose M. Juarez .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Valero-Leal, E., Campos, M., Juarez, J.M. (2023). Simple Explanations to Summarise Subgroup Discovery Outcomes: A Case Study Concerning Patient Phenotyping. In: Koprinska, I., et al. Machine Learning and Principles and Practice of Knowledge Discovery in Databases. ECML PKDD 2022. Communications in Computer and Information Science, vol 1752. Springer, Cham. https://doi.org/10.1007/978-3-031-23618-1_29

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-23618-1_29

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-23617-4

  • Online ISBN: 978-3-031-23618-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics