Skip to main content

A Reproducibility Study of Subgroup Discovery Algorithms

  • Conference paper
  • First Online:
New Trends in Database and Information Systems (ADBIS 2024)

Abstract

Subgroup discovery is an important data mining technique that aims to uncover specific segments of data that exhibit noteworthy patterns or behaviors. Despite the existence of several subgroup discovery algorithms, there is no thorough comparative analysis of their effectiveness. This paper evaluates two recent subgroup discovery algorithms against the seminal PRIM method, demonstrating PRIM’s superior accuracy, speed, and effectiveness in identifying interesting subgroups. These findings highlight the need for a comprehensive evaluation of subgroup discovery techniques to determine the state of the art.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    https://github.com/Arzik1987/Prim_SuRF_DivExplorer.

  2. 2.

    https://github.com/Skeftical/SuRF-Reproducibility.

  3. 3.

    https://cran.r-project.org/web/packages/prim/index.html.

  4. 4.

    https://github.com/divexplorer/divexplorer.

  5. 5.

    Unfortunately, we do not have the dataset with undiscretized attributes to perform these experiments.

References

  1. Angwin, J., et al.: Machine bias. ProPublica (May 2016)

    Google Scholar 

  2. Atzmueller, M.: Subgroup Discovery. WIREs Data Min. Knowl. Discov. 5, 35–49 (2015)

    Article  Google Scholar 

  3. Bryant, B.P., Lempert, R.J.: Thinking inside the box: a participatory, computer-assisted approach to scenario discovery. Technol. Forecast. Soc. Change 77, 34–49 (2010)

    Article  Google Scholar 

  4. Chen, T., Guestrin, C.: XGBoost: a scalable tree boosting system. In: KDD (2016)

    Google Scholar 

  5. Dazard, J.E., Rao, J.S.: Local sparse bump hunting. J. Comput. Graph. Stat. 19(4), 900–929 (2010)

    Article  Google Scholar 

  6. Duivesteijn, W., Feelders, A.J., Knobbe, A.: Exceptional Model Mining. Data Min. Knowl. Discov. 30(1), 47–98 (2015). https://doi.org/10.1007/s10618-015-0403-4

    Article  MathSciNet  Google Scholar 

  7. Friedman, J.H., Fisher, N.I.: Bump hunting in high-dimensional data. Stat. Comput. 9(2), 123–143 (1999). https://doi.org/10.1023/A:1008894516817

    Article  Google Scholar 

  8. Grosskreutz, H., Rüping, S.: On subgroup discovery in numerical domains. Data Min. Knowl. Discov. 19(2), 210–226 (2009). https://doi.org/10.1007/s10618-009-0136-3

    Article  MathSciNet  Google Scholar 

  9. Herrera, F., et al.: An overview on subgroup discovery: foundations and applications. Knowl. Inf. Syst. 29(3) (2011)

    Google Scholar 

  10. Leman, D., et al.: Exceptional model mining. In: ECML/PKDD (2008)

    Google Scholar 

  11. Pastor, E., et al.: Looking for trouble: analyzing classifier behavior via pattern divergence. In: SIGMOD Conference. ACM (2021)

    Google Scholar 

  12. Savva, F., et al.: Surf: identification of interesting data regions with surrogate models. In: ICDE (2020)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Vadim Arzamasov .

Editor information

Editors and Affiliations

Ethics declarations

Disclosure of Interests

The authors have no competing interests to declare that are relevant to the content of this article.

Rights and permissions

Reprints and permissions

Copyright information

© 2025 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Arzamasov, V., Böhm, K. (2025). A Reproducibility Study of Subgroup Discovery Algorithms. In: Tekli, J., et al. New Trends in Database and Information Systems. ADBIS 2024. Communications in Computer and Information Science, vol 2186. Springer, Cham. https://doi.org/10.1007/978-3-031-70421-5_1

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-70421-5_1

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-70420-8

  • Online ISBN: 978-3-031-70421-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics