Skip to main content

Simple Explanations to Summarise Subgroup Discovery Outcomes: A Case Study Concerning Patient Phenotyping

  • Conference paper
  • First Online:
Machine Learning and Principles and Practice of Knowledge Discovery in Databases (ECML PKDD 2022)

Abstract

Phenotyping is essential in medical research, as it provides a better understanding of healthcare problems owing to the fact that clinical phenotypes identify subsets of patients with common characteristics. Subgroup discovery (SD) appears to be a promising machine learning approach because it provides a framework with which to search for interesting subgroups according to the relations between the individual characteristics and a target value. Each single pattern extracted by SD algorithms is human-readable. However, its complexity (the number of attributes involved) and the high number of subgroups obtained make the overall model difficult to understand. In this work, we propose a method with which to explain SD, designed for the clinical context. We have employed a two-step process in order to obtain SD model-agnostic explanations based on a decision tree surrogate model. The complexity involved in evaluating explainable methods led us to adopt a multiple strategy. We first show how explanations are built, and test a selection of state-of-the-art SD algorithms and gold-standard datasets. We then illustrate the suitability of the method in a clinical use case for an antimicrobial resistance problem. Finally, we study the utility of the method by surveying a small group in order to validate it from a human-centric perspective.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

Notes

  1. 1.

    Available at PyPI, https://pypi.org/project/subgroups/.

  2. 2.

    Available at GitHub, https://github.com/Enrique-Val/SubgroupExplainer.

References

  1. Alcalá-Fdez, J., et al.: KEEL data-mining software tool: data set repository, integration of algorithms and experimental analysis framework. J. Multiple-Valued Logic Soft Comput. 17(2–3), 255–287 (2011)

    Google Scholar 

  2. Andrews, J.M.: Determination of minimum inhibitory concentrations. J. Antimicrobial Chemotherapy 48(Suppl. 1), 5–16 (2001)

    Google Scholar 

  3. Atzmueller, M.: Subgroup discovery. Wiley Interdiscipl. Rev. Data Min. Knowl. Discov. 5(1), 35–49 (2015)

    Article  Google Scholar 

  4. Atzmueller, M., Puppe, F.: SD-map – a fast algorithm for exhaustive subgroup discovery. In: Fürnkranz, J., Scheffer, T., Spiliopoulou, M. (eds.) PKDD 2006. LNCS (LNAI), vol. 4213, pp. 6–17. Springer, Heidelberg (2006). https://doi.org/10.1007/11871637_6

    Chapter  Google Scholar 

  5. Carmona, C.J., del Jesus, M.J., Herrera, F.: A unifying analysis for the supervised descriptive rule discovery via the weighted relative accuracy. Knowl.-Based Syst. 139, 89–100 (2018)

    Google Scholar 

  6. Craven, M., Shavlik, J.: Extracting tree-structured representations of trained networks. Adv. Neural. Inf. Process. Syst. 8, 24–30 (1995)

    Google Scholar 

  7. Dash, S., Gunluk, O., Wei, D.: Boolean decision rules via column generation. Adv. Neural. Inf. Process. Syst. 31, 4655–4665 (2018)

    Google Scholar 

  8. Di Castro, F., Bertini, E.: Surrogate decision tree visualization interpreting and visualizing black-box classification models with surrogate decision tree. In: Joint Proceedings of the ACM IUI 2019 Workshops Co-located with the 24th ACM Conference on Intelligent User Interfaces of CEUR Workshop Proceedings, vol. 2327. CEUR-WS (2019)

    Google Scholar 

  9. Gamberger, D., Lavrac, N.: Expert-guided subgroup discovery: methodology and application. J. Artif. Intell. Res. 17, 501–527 (2002)

    Article  MATH  Google Scholar 

  10. Gordon, A.D., Breiman, L., Friedman, J.H., Olshen, R.A., Stone, C.J.: Classification and regression trees. Biometrics 40(3), 874 (1984)

    Article  MATH  Google Scholar 

  11. Grosskreutz, H., Rüping, S., Wrobel, S.: Tight optimistic estimates for fast subgroup discovery. In: Daelemans, W., Goethals, B., Morik, K. (eds.) ECML PKDD 2008. LNCS (LNAI), vol. 5211, pp. 440–456. Springer, Heidelberg (2008). https://doi.org/10.1007/978-3-540-87479-9_47

    Chapter  Google Scholar 

  12. Helal, S.: Subgroup discovery algorithms: a survey and empirical evaluation. Knowl. Inf. Syst. 3(29), 495–525 (2011)

    Google Scholar 

  13. Ibrahim, L., Mesinovic, M., Yang, K.-W., Eid, M.A.: Explainable prediction of acute myocardial infarction using machine learning and Shapley values. IEEE Access 8, 210410–210417 (2020)

    Article  Google Scholar 

  14. Johnson, A.E.W., et al.: Mimic-iii, a freely accessible critical care database. Sci. Data 3(1), 1–9 (2016)

    Google Scholar 

  15. Klösgen, W.: Explora: a multipattern and multistrategy discovery assistant. In: Advances in Knowledge Discovery and Data Mining, pp. 249–271. AAAI/MIT Press (1996)

    Google Scholar 

  16. Lavrač, N., Flach, P., Zupan, B.: Rule evaluation measures: a unifying view. In: Džeroski, S., Flach, P. (eds.) ILP 1999. LNCS (LNAI), vol. 1634, pp. 174–185. Springer, Heidelberg (1999). https://doi.org/10.1007/3-540-48751-4_17

    Chapter  Google Scholar 

  17. Lavrac, N., Kavsek, B., Flach, P., Todorovski, L.: Subgroup discovery with CN2-SD. J. Mach. Learn. Res. 5(2), 153–188 (2004)

    Google Scholar 

  18. Lemmerich, F., Rohlfs, M., Atzmueller, M.: Fast discovery of relevant subgroup patterns. In: Proceedings of the Twenty-Third International Florida Artificial Intelligence Research Society Conference (FLAIRS), pp. 428–433. AAAI Press (2010)

    Google Scholar 

  19. Lonjarret, C., Robardet, C., Plantevit, M., Auburtin, R., Atzmueller, M.: Why should I trust this item? Explaining the recommendations of any model. In: 2020 IEEE 7th International Conference on Data Science and Advanced Analytics (DSAA), pp. 526–535 (2020)

    Google Scholar 

  20. Lundberg, S.M., et al.: From local explanations to global understanding with explainable AI for trees. Nat. Mach. Intell. 2(1), 56–67 (2020)

    Article  Google Scholar 

  21. Lundberg, S.M., Lee, S.-I.: A unified approach to interpreting model predictions. Adv. Neural Inf. Process. Syst. 30, 4765–4774 (2017)

    Google Scholar 

  22. Magesh, P.R., Myloth, R.D., Tom, R.J.: An explainable machine learning model for early detection of Parkinson’s disease using LIME on DaTSCAN imagery. Comput. Biol. Med. 126, 104041 (2020)

    Google Scholar 

  23. Miller, T.: Explanation in artificial intelligence: insights from the social sciences. Artif. Intell. 267, 1–38 (2019)

    Article  MATH  Google Scholar 

  24. Molnar, C.: Interpretable Machine Learning. Lulu.com (2019)

    Google Scholar 

  25. Mueller, M., Rosales, R., Steck, H., Krishnan, S., Rao, B., Kramer, S.: Subgroup discovery for test selection: a novel approach and its application to breast cancer diagnosis. In: Adams, N.M., Robardet, C., Siebes, A., Boulicaut, J.-F. (eds.) IDA 2009. LNCS, vol. 5772, pp. 119–130. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-642-03915-7_11

    Chapter  Google Scholar 

  26. Novak, P.K., Lavrač, N., Webb, G.I., Supervised descriptive rule discovery: a unifying survey of contrast set, emerging pattern and subgroup mining. J. Mach. Learn. Res. 10(2), 377–410 (2009)

    Google Scholar 

  27. Podgorelec, V., Kokol, P., Stiglic, B., Rozman, I.: Decision trees: an overview and their use in medicine. J. Med. Syst. 26(5), 445–463 (2002)

    Article  Google Scholar 

  28. Proença, H.M., Grünwald, P., Bäck, T., Leeuwen, M.: Discovering outstanding subgroup lists for numeric targets using MDL. In: Hutter, F., Kersting, K., Lijffijt, J., Valera, I. (eds.) ECML PKDD 2020. LNCS (LNAI), vol. 12457, pp. 19–35. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-67658-2_2

    Chapter  Google Scholar 

  29. Quinlan, J.R.: Induction of decision trees. Mach. Learn. 1(1), 81–106 (1986)

    Article  Google Scholar 

  30. Ribeiro, M.T., Singh, S., Guestrin, C.: Why should I trust you?: explaining the predictions of any classifier. In: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 1135–1144. ACM (2016)

    Google Scholar 

  31. van Leeuwen, M., Knobbe, A.: Non-redundant subgroup discovery in large and complex data. In: Gunopulos, D., Hofmann, T., Malerba, D., Vazirgiannis, M. (eds.) ECML PKDD 2011. LNCS (LNAI), vol. 6913, pp. 459–474. Springer, Heidelberg (2011). https://doi.org/10.1007/978-3-642-23808-6_30

    Chapter  Google Scholar 

  32. Vavpetič, A., Podpečan, V., Lavrač, N.: Semantic subgroup explanations. J. Intell. Inf. Syst. 42(2), 233–254 (2013)

    Article  Google Scholar 

  33. Vavpetič, A., Podpečan, V., Meganck, S., Lavrač, N.: Explaining subgroups through ontologies. In: Anthony, P., Ishizuka, M., Lukose, D. (eds.) PRICAI 2012. LNCS (LNAI), vol. 7458, pp. 625–636. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-32695-0_55

    Chapter  Google Scholar 

  34. Ventura, S., Luna, J.M., et al.: Supervised Descriptive Pattern Mining. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-98140-6

  35. Wei, D., Dash, S., Gao, T., Gunluk, O.: Generalized linear rule models. In: International Conference on Machine Learning, pp. 6687–6696. Proceedings of Machine Learning Research (2019)

    Google Scholar 

  36. Wrobel, S.: An algorithm for multi-relational discovery of subgroups. In: Komorowski, J., Zytkow, J. (eds.) PKDD 1997. LNCS, vol. 1263, pp. 78–87. Springer, Heidelberg (1997). https://doi.org/10.1007/3-540-63223-9_108

    Chapter  Google Scholar 

Download references

Acknowledgement

This work was partially funded by the CONFAINCE project (Ref: PID2021-122194OB-I00), supported by the Spanish Ministry of Science and Innovation, the Spanish Agency for Research and the IMPACT-T2D project (PMP21/00092) supported by the Spanish Health Institute Carlos III (ISCIII).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jose M. Juarez .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Valero-Leal, E., Campos, M., Juarez, J.M. (2023). Simple Explanations to Summarise Subgroup Discovery Outcomes: A Case Study Concerning Patient Phenotyping. In: Koprinska, I., et al. Machine Learning and Principles and Practice of Knowledge Discovery in Databases. ECML PKDD 2022. Communications in Computer and Information Science, vol 1752. Springer, Cham. https://doi.org/10.1007/978-3-031-23618-1_29

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-23618-1_29

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-23617-4

  • Online ISBN: 978-3-031-23618-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics