Subgroup Mining

Klösgen, W.

doi:10.1007/978-3-7091-2588-5_2

W. Klösgen⁹

Part of the book series: International Centre for Mechanical Sciences ((CISM,volume 408))

120 Accesses

Abstract

Statistical findings on subgroups belong to the most popular and simple forms of knowledge we encounter in all domains of science, business, or even daily life. We read or hear such messages as: Lung cancer mortality rate has considerably increased for women during the last 10 years, unemployment rate is overproportionally high for young men with low educational level, potential of violance is the highest for males between 14 and 18. In this paper, we first compare knowledge expressed by subgroup patterns with other popular knowledge types of Knowledge Discovery in Databases (KDD), introduce types of description languages for subgroups, summarize general pattern classes for subgroup deviations and associations. A deviation pattern describes a deviating behavior of a target variable in a subgroup. Deviation patterns rely on statistical tests and thus capture knowledge about a subgroup in form of a verified (alternative) hypothesis on the distribution of a target variable. Search for deviating subgroups is organized in two phases. In a brute force search, alternative search heuristics can be applied to find a set of deviating subgroups. In a second refinement phase, redundancy elimination operators identify a system of subgroups. We discuss the role of tests for subgroup mining, introduce specializations of the general deviation pattern, summarize search approaches, and deal with navigation and visualization operations that support an analyst in interactively constructing a best system of deviating subgroups.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Michalski, R.S.: A Theory and Methodology of Inductive Learning, in: Machine Learning: An Artificial Intelligence Approach (eds. Michalski, R.S.; Carbonell, J. and Mitchell, T. ), Tioga Publishing, Palo Alto 1983, 83–134.
Chapter Google Scholar
Wrobel, S.: An Algorithm for Multi-relational Discovery of Subgroups, in: Proceedings of the First European Symposium on Principles of KDD (eds. Komorowski, J. and Zytkow, J. ), Springer-Verlag, Berlin 1997, 78–87.
Google Scholar
Klösgen, W.: Explora: A Multipattern and Multistrategy Discovery Assistant, in: Advances in Knowledge Discovery and Data Mining (eds. Fayyad, U.; PiatetskyShapiro, G.; Smyth, P. and Uthurusamy, R. ), MIT Press, Cambridge 1996. 249–271.
Google Scholar
Friedman, J. and Fisher, N.: Bump Hunting in High-Dimensional Data, in: Statistics and Computing 1998.
Google Scholar
Smyth, P. and Goodman, R.: An information theoretic approach to rule induction, in: IEEE Trans. Knowledge and Data Engineering 4, 1992.
Google Scholar
Gebhardt, F.: Choosing among Competing Generalizations, in: Knowledge Acquisition 3, 1991.
Google Scholar
Friendly, M.: Conceptual and Visual Models for Categorical Data, in: The American Statistician 1993.
Google Scholar

Download references

Author information

Authors and Affiliations

German National Research Center for Information Technology (GMD), Sankt Augustin, Germany
W. Klösgen

Authors

W. Klösgen
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

University of Udine, Italy
Giacomo Della Riccia
Otto-Von-Guericke University, Germany
Rudolf Kruse
Free University of Berlin, Germany
Hanz-J. Lenz

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Klösgen, W. (2000). Subgroup Mining. In: Della Riccia, G., Kruse, R., Lenz, HJ. (eds) Computational Intelligence in Data Mining. International Centre for Mechanical Sciences, vol 408. Springer, Vienna. https://doi.org/10.1007/978-3-7091-2588-5_2

Download citation

DOI: https://doi.org/10.1007/978-3-7091-2588-5_2
Publisher Name: Springer, Vienna
Print ISBN: 978-3-211-83326-1
Online ISBN: 978-3-7091-2588-5
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics