Abstract
Statistical findings on subgroups belong to the most popular and simple forms of knowledge we encounter in all domains of science, business, or even daily life. We read or hear such messages as: Lung cancer mortality rate has considerably increased for women during the last 10 years, unemployment rate is overproportionally high for young men with low educational level, potential of violance is the highest for males between 14 and 18. In this paper, we first compare knowledge expressed by subgroup patterns with other popular knowledge types of Knowledge Discovery in Databases (KDD), introduce types of description languages for subgroups, summarize general pattern classes for subgroup deviations and associations. A deviation pattern describes a deviating behavior of a target variable in a subgroup. Deviation patterns rely on statistical tests and thus capture knowledge about a subgroup in form of a verified (alternative) hypothesis on the distribution of a target variable. Search for deviating subgroups is organized in two phases. In a brute force search, alternative search heuristics can be applied to find a set of deviating subgroups. In a second refinement phase, redundancy elimination operators identify a system of subgroups. We discuss the role of tests for subgroup mining, introduce specializations of the general deviation pattern, summarize search approaches, and deal with navigation and visualization operations that support an analyst in interactively constructing a best system of deviating subgroups.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Michalski, R.S.: A Theory and Methodology of Inductive Learning, in: Machine Learning: An Artificial Intelligence Approach (eds. Michalski, R.S.; Carbonell, J. and Mitchell, T. ), Tioga Publishing, Palo Alto 1983, 83–134.
Wrobel, S.: An Algorithm for Multi-relational Discovery of Subgroups, in: Proceedings of the First European Symposium on Principles of KDD (eds. Komorowski, J. and Zytkow, J. ), Springer-Verlag, Berlin 1997, 78–87.
Klösgen, W.: Explora: A Multipattern and Multistrategy Discovery Assistant, in: Advances in Knowledge Discovery and Data Mining (eds. Fayyad, U.; PiatetskyShapiro, G.; Smyth, P. and Uthurusamy, R. ), MIT Press, Cambridge 1996. 249–271.
Friedman, J. and Fisher, N.: Bump Hunting in High-Dimensional Data, in: Statistics and Computing 1998.
Smyth, P. and Goodman, R.: An information theoretic approach to rule induction, in: IEEE Trans. Knowledge and Data Engineering 4, 1992.
Gebhardt, F.: Choosing among Competing Generalizations, in: Knowledge Acquisition 3, 1991.
Friendly, M.: Conceptual and Visual Models for Categorical Data, in: The American Statistician 1993.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2000 Springer-Verlag Wien
About this paper
Cite this paper
Klösgen, W. (2000). Subgroup Mining. In: Della Riccia, G., Kruse, R., Lenz, HJ. (eds) Computational Intelligence in Data Mining. International Centre for Mechanical Sciences, vol 408. Springer, Vienna. https://doi.org/10.1007/978-3-7091-2588-5_2
Download citation
DOI: https://doi.org/10.1007/978-3-7091-2588-5_2
Publisher Name: Springer, Vienna
Print ISBN: 978-3-211-83326-1
Online ISBN: 978-3-7091-2588-5
eBook Packages: Springer Book Archive