Abstract
Subgroup discovery is an important data mining technique that aims to uncover specific segments of data that exhibit noteworthy patterns or behaviors. Despite the existence of several subgroup discovery algorithms, there is no thorough comparative analysis of their effectiveness. This paper evaluates two recent subgroup discovery algorithms against the seminal PRIM method, demonstrating PRIM’s superior accuracy, speed, and effectiveness in identifying interesting subgroups. These findings highlight the need for a comprehensive evaluation of subgroup discovery techniques to determine the state of the art.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
- 2.
- 3.
- 4.
- 5.
Unfortunately, we do not have the dataset with undiscretized attributes to perform these experiments.
References
Angwin, J., et al.: Machine bias. ProPublica (May 2016)
Atzmueller, M.: Subgroup Discovery. WIREs Data Min. Knowl. Discov. 5, 35–49 (2015)
Bryant, B.P., Lempert, R.J.: Thinking inside the box: a participatory, computer-assisted approach to scenario discovery. Technol. Forecast. Soc. Change 77, 34–49 (2010)
Chen, T., Guestrin, C.: XGBoost: a scalable tree boosting system. In: KDD (2016)
Dazard, J.E., Rao, J.S.: Local sparse bump hunting. J. Comput. Graph. Stat. 19(4), 900–929 (2010)
Duivesteijn, W., Feelders, A.J., Knobbe, A.: Exceptional Model Mining. Data Min. Knowl. Discov. 30(1), 47–98 (2015). https://doi.org/10.1007/s10618-015-0403-4
Friedman, J.H., Fisher, N.I.: Bump hunting in high-dimensional data. Stat. Comput. 9(2), 123–143 (1999). https://doi.org/10.1023/A:1008894516817
Grosskreutz, H., Rüping, S.: On subgroup discovery in numerical domains. Data Min. Knowl. Discov. 19(2), 210–226 (2009). https://doi.org/10.1007/s10618-009-0136-3
Herrera, F., et al.: An overview on subgroup discovery: foundations and applications. Knowl. Inf. Syst. 29(3) (2011)
Leman, D., et al.: Exceptional model mining. In: ECML/PKDD (2008)
Pastor, E., et al.: Looking for trouble: analyzing classifier behavior via pattern divergence. In: SIGMOD Conference. ACM (2021)
Savva, F., et al.: Surf: identification of interesting data regions with surrogate models. In: ICDE (2020)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Ethics declarations
Disclosure of Interests
The authors have no competing interests to declare that are relevant to the content of this article.
Rights and permissions
Copyright information
© 2025 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Arzamasov, V., Böhm, K. (2025). A Reproducibility Study of Subgroup Discovery Algorithms. In: Tekli, J., et al. New Trends in Database and Information Systems. ADBIS 2024. Communications in Computer and Information Science, vol 2186. Springer, Cham. https://doi.org/10.1007/978-3-031-70421-5_1
Download citation
DOI: https://doi.org/10.1007/978-3-031-70421-5_1
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-70420-8
Online ISBN: 978-3-031-70421-5
eBook Packages: Computer ScienceComputer Science (R0)