Efficient Subgroup Discovery Through Auto-Encoding

van der Haar, Joost F.; Nagelkerken, Sander C.; Smit, Igor G.; van Straaten, Kjell; Tack, Janneke A.; Schouten, Rianne M.; Duivesteijn, Wouter

doi:10.1007/978-3-031-01333-1_26

Joost F. van der Haar¹¹,
Sander C. Nagelkerken¹¹,
Igor G. Smit¹¹,
Kjell van Straaten¹¹,
Janneke A. Tack¹¹,
Rianne M. Schouten¹¹ &
…
Wouter Duivesteijn¹¹

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13205))

Included in the following conference series:

International Symposium on Intelligent Data Analysis

Abstract

Current subgroup discovery methods struggle to produce good results for large real-life datasets with high dimensionality. Run times can become high and dependencies between attributes are hard to capture. We propose a method in which auto-encoding is applied for dimensionality reduction before subgroup discovery is performed. In an experimental study, we find that auto-encoding increases both the quality and coverage for our dataset with over 500 attributes. On the dataset with over 250 attributes and the one with the most instances, the coverage improves, while the quality remains similar. For smaller datasets, quality and coverage remain similar or see a minor decrease. Additionally, we greatly improve the run time for each dataset-algorithm combination; for the datasets with over 250 and 500 attributes run times decrease by a factor of on average 150 and 200, respectively. We conclude that dimensionality reduction is a promising method for subgroup discovery in datasets with many attributes and/or a high number of instances.

J. F. van der Haar, S. C. Nagelkerken, I. G. Smit, K. van Straaten and J. A. Tack—These authors contributed equally to this work.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 64.99; Price excludes VAT (USA)

Softcover Book: USD 84.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

For real: a thorough look at numeric attributes in subgroup discovery

Article Open access 21 September 2020

Discovering Outstanding Subgroup Lists for Numeric Targets Using MDL

Subgroup Discovery Algorithms: A Survey and Empirical Evaluation

Article 06 May 2016

Notes

1.
cf. Github repository at https://github.com/JFvdH/Efficient-SD-through-AE.
2.
Notice that, for making these distinctive comparisons, we must compare presence or absence of individuals in subgroups in the original data space, with presence or absence of encoded items in subgroups in the encoded space. Naively, this may seem nontrivial, but notice that the number of individuals and the number of items is identical: when encoding, the representation of each individual is changed and its number of attributes may change, but each individual has one unique counterpart item in the encoded space. This enables identification of added and lost items across the divide between original data space and encoded space.

References

Abadi, M., Agarwal, A., Barham, P., et al.: TensorFlow: large-scale machine learning on heterogeneous systems (2015). https://www.tensorflow.org/
Atzmueller, M.: Subgroup discovery. Wiley Interdisc. Rev. Data Min. Knowl. Discov. 5(1), 35–49 (2015)
Article Google Scholar
Carmona, C.J., González, P., del Jesus, M.J., Herrera, F.: NMEEF-SD: non-dominated multiobjective evolutionary algorithm for extracting fuzzy rules in subgroup discovery. IEEE Trans. Fuzzy Syst. 18(5), 958–970 (2010)
Article Google Scholar
Chipman, H.A., Gu, H.: Interpretable dimension reduction. J. Appl. Stat. 32(9), 969–987 (2005)
Article MathSciNet Google Scholar
Duivesteijn, W., van Dijk, T.C.: Exceptional gestalt mining: combining magic cards to make complex coalitions thrive. In: Proceedings of MLSA (2021)
Google Scholar
Duivesteijn, W., Feelders, A.J., Knobbe, A.: Exceptional model mining. Data Min. Knowl. Disc. 30(1), 47–98 (2015). https://doi.org/10.1007/s10618-015-0403-4
Article MathSciNet MATH Google Scholar
Duivesteijn, W., Loza Mencía, E., Fürnkranz, J., Knobbe, A.: Multi-label LeGo—enhancing multi-label classifiers with local patterns. In: Hollmén, J., Klawonn, F., Tucker, A. (eds.) IDA 2012. LNCS, vol. 7619, pp. 114–125. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-34156-4_12
Chapter Google Scholar
Gamberger, D., Lavrač, N.: Expert-guided subgroup discovery: methodology and application. J. Artif. Intell. Res. 17, 501–527 (2002)
Article Google Scholar
Grosskreutz, H., Rüping, S.: On subgroup discovery in numerical domains. Data Min. Knowl. Disc. 19(2), 210–226 (2009)
Article MathSciNet Google Scholar
Herrera, F., Carmona, C.J., González, P., del Jesus, M.J.: An overview on subgroup discovery: foundations and applications. Knowl. Inf. Syst. 29(3), 495–525 (2011)
Article Google Scholar
Hinton, G.E., Salakhutdinov, R.R.: Reducing the dimensionality of data with neural networks. Science 313(5786), 504–507 (2006)
Article MathSciNet Google Scholar
Hosseini, B., Hammer, B.: Interpretable discriminative dimensionality reduction and feature selection on the manifold. In: Brefeld, U., Fromont, E., Hotho, A., Knobbe, A., Maathuis, M., Robardet, C. (eds.) ECML PKDD 2019. LNCS (LNAI), vol. 11906, pp. 310–326. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-46150-8_19
Chapter Google Scholar
Kavšek, B., Lavrač, N.: APRIORI-SD: adapting association rule learning to subgroup discovery. Appl. Artif. Intell. 20(7), 543–583 (2006)
Article Google Scholar
Klösgen, W.: EXPLORA: a multipattern and multistrategy discovery assistant. In: Advances in Knowledge Discovery and Data Mining, pp. 249–271 (1996)
Google Scholar
Knobbe, A., Crémilleux, B., Fürnkranz, J., Scholz, M.: From local patterns to global models: the LeGo approach to data mining. In: Proceedings of LeGo workshop @ ECMLPKDD, pp. 1–16 (2008)
Google Scholar
Konijn, R.M., Duivesteijn, W., Kowalczyk, W., Knobbe, A.: Discovering local subgroups, with an application to fraud detection. In: Pei, J., Tseng, V.S., Cao, L., Motoda, H., Xu, G. (eds.) PAKDD 2013. LNCS (LNAI), vol. 7818, pp. 1–12. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-37453-1_1
Chapter Google Scholar
Lavrač, N., Flach, P., Zupan, B.: Rule evaluation measures: a unifying view. In: Džeroski, S., Flach, P. (eds.) ILP 1999. LNCS (LNAI), vol. 1634, pp. 174–185. Springer, Heidelberg (1999). https://doi.org/10.1007/3-540-48751-4_17
Chapter Google Scholar
Lavrač, N., Kavšek, B., Flach, P., Todorovski, L.: Subgroup discovery with CN2-SD. J. Mach. Learn. Res. 5(2), 153–188 (2004)
MathSciNet Google Scholar
van Leeuwen, M., Knobbe, A.: Diverse subgroup set discovery. Data Min. Knowl. Discov. 25, 208–242 (2012)
Article MathSciNet Google Scholar
Lemmerich, F., Becker, M.: pysubgroup: easy-to-use subgroup discovery in Python. In: Brefeld, U., et al. (eds.) ECML PKDD 2018. LNCS (LNAI), vol. 11053, pp. 658–662. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-10997-4_46
Chapter Google Scholar
Maas, A.L., Hannun, A.Y., Ng, A.Y.: Rectifier nonlinearities improve neural network acoustic models. In: Proceedings of ICML Workshop on Deep Learning for Audio, Speech and Language Processing (2013)
Google Scholar
Meeng, M., Knobbe, A.: For real: a thorough look at numeric attributes in subgroup discovery. Data Min. Knowl. Disc. 35(1), 158–212 (2020). https://doi.org/10.1007/s10618-020-00703-x
Article MathSciNet MATH Google Scholar
Proença, H.M., Klijn, R., Bäck, T., van Leeuwen, M.: Identifying flight delay patterns using diverse subgroup discovery. In: Proceedings of SSCI, pp. 60–67 (2018)
Google Scholar
Riffenburgh, R.H.: Linear discriminant analysis. Ph.D. thesis, Virginia Polytechnic Institute (1957)
Google Scholar
Tenenbaum, J.B., de Silva, V., Langford, J.C.: A global geometric framework for nonlinear dimensionality reduction. Science 290(5500), 2319–2323 (2000)
Article Google Scholar
Thorndike, R.L.: Who belongs in the family? Psychometrika 18(4), 267–276 (1953)
Article Google Scholar
Wang, Y., Yao, H., Zhao, S.: Auto-encoder based dimensionality reduction. Neurocomputing 184, 232–242 (2016)
Article Google Scholar
Wold, S., Esbensen, K., Geladi, P.: Principal component analysis. Chemom. Intell. Lab. Syst. 2(1–3), 37–52 (1987)
Article Google Scholar
Wrobel, S.: An algorithm for multi-relational discovery of subgroups. In: Komorowski, J., Zytkow, J. (eds.) PKDD 1997. LNCS, vol. 1263, pp. 78–87. Springer, Heidelberg (1997). https://doi.org/10.1007/3-540-63223-9_108
Chapter Google Scholar
Xu, B., Wang, N., Chen, T., Li, M.: Empirical evaluation of rectified activations in convolutional network. arXiv preprint arXiv:1505.00853 (2015)
Zimmermann, A., De Raedt, L.: Cluster-grouping: from subgroup discovery to clustering. Mach. Learn. 77(1), 125–159 (2009)
Article Google Scholar

Download references

Acknowledgments

This work is part of the research program Data2People with project EDIC and partly financed by the Dutch Research Council (NWO).

Author information

Authors and Affiliations

Technische Universiteit Eindhoven, Eindhoven, the Netherlands
Joost F. van der Haar, Sander C. Nagelkerken, Igor G. Smit, Kjell van Straaten, Janneke A. Tack, Rianne M. Schouten & Wouter Duivesteijn

Authors

Joost F. van der Haar
View author publications
You can also search for this author in PubMed Google Scholar
Sander C. Nagelkerken
View author publications
You can also search for this author in PubMed Google Scholar
Igor G. Smit
View author publications
You can also search for this author in PubMed Google Scholar
Kjell van Straaten
View author publications
You can also search for this author in PubMed Google Scholar
Janneke A. Tack
View author publications
You can also search for this author in PubMed Google Scholar
Rianne M. Schouten
View author publications
You can also search for this author in PubMed Google Scholar
Wouter Duivesteijn
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Wouter Duivesteijn .

Editor information

Editors and Affiliations

University of Rennes, Rennes, France
Tassadit Bouadi
University of Rennes, Rennes, France
Elisa Fromont
University of Munich, LMU, Munich, Germany
Eyke Hüllermeier

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

van der Haar, J.F. et al. (2022). Efficient Subgroup Discovery Through Auto-Encoding. In: Bouadi, T., Fromont, E., Hüllermeier, E. (eds) Advances in Intelligent Data Analysis XX. IDA 2022. Lecture Notes in Computer Science, vol 13205. Springer, Cham. https://doi.org/10.1007/978-3-031-01333-1_26

Download citation

DOI: https://doi.org/10.1007/978-3-031-01333-1_26
Published: 07 April 2022
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-01332-4
Online ISBN: 978-3-031-01333-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics