Abstract
Data mining is usually introduced as search for interesting patterns in data. It is often an explorative step iteratively performed within a process of knowledge discovery in data bases (KDD). A mining step typically relies on strategies for systematic search in large hypotheses spaces guided by the autonomous evaluation of statistical tests. We describe the subgroup mining approach that is based on deviation and association patterns. A typical database contains values of attributes for many objects (persons, transactions, documents). Interpretable subgroups of these objects are searched that deviate from a designated expected behavior. Many types of data analysis questions can be answered by subgroup mining with diverse specializations of general deviation and association patterns. Tests measure the statistical interestingness of subgroup deviations. After summarizing the approach by discussing the fundamental components of subgroup pattern classes concerning validation, search and interactive presentation of pattern instances, we explain how deviation patterns of subgroup mining are applied for temporal, spatial and textual databases.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
R. Agrawal, H. Mannila, R. Srikant, H. Toivonen, I. Verkamo 1996. Fast Discovery of Association Rules. In Advances in Knowledge Discovery and Data Mining, eds. U. Fayyad, G. Piatetsky-Shapiro, P. Smyth, R. Uthurusamy, Cambridge, MA: MIT Press.
J. Dougherty, R. Kohavi, M. Sahami 1995. Supervised and unsupervised discretization of continuous features. Proceedings of 12th Internat. Conference on Machine Learning.
U. Fayyad, G. Piatetsky-Shapiro, P. Smyth, R. Uthurusamy (eds) 1996. Advances in Knowledge Discovery and Data Mining, Cambridge, MA: MIT Press.
R. Feldman, W. Klösgen, A. Zilberstein 1997. Visualization Techniques to Explore Data Mining Results for Document Collections. Proceedings of Third International Conference on Knowledge Discovery and Data Mining (KDD-97), eds. D. Heckerman, H. Mannila, D. Pregibon, Menlo Park: AAAI Press.
D. Fisher 1987. Knowledge Acquisition via Incremental Conceptual Clustering. Machine Learning 2.
J. Friedman, N. Fisher 1997. Bump Hunting in High-Dimensional Data. http://stat.stanford.edu/~jhf/ftp/prim.ps.Z
F. Gebhardt 1991. Choosing among Competing Generalizations. Knowledge Acquisition 3.
F. Gebhardt 1997. Finding Spatial Clusters. In Proceedings of the First European Symposium on Principles of KDD, eds. J. Komorowski and J. Zytkow. Berlin: Springer.
W. Klösgen 1992. Problems for Knowledge Discovery in Databases and their Treatment in the Statistics Interpreter Explora, Internat. Journal for Intelligent Systems vol 7(7).
W. Klösgen 1996. Explora: A Multipattern and Multistrategy Discovery Assistant. In Advances in Knowledge Discovery and Data Mining, eds. U. Fayyad, G. Piatetsky-Shapiro, P. Smyth, and R. Uthurusamy, Cambridge, MA: MIT Press.
W. Klösgen, J. Zytkow 1996. Knowledge Discovery in Databases Terminology. In Advances in Knowledge Discovery and Data Mining, eds. U. Fayyad, G. Piatetsky-Shapiro, P. Smyth, and R. Uthurusamy, Cambridge, MA: MIT Press.
H. Mannila, H. Toivonen, I. Verkamo 1997. Discovery of Frequent Episodes in Event Sequences. Data Mining and Knowledge Discovery, Vol. 1, No. 3.
A. Siebes 1995. Data Surveying: Foundations of an Inductive Query Language. Proceedings of the First International Conference on Knowledge Discovery and Data Mining (KDDM95), eds. U. Fayyad and R. Uthurusamy, Menlo Park, CA: AAAI Press.
S. Wrobel 1997. An Algorithm for Multirelational Discovery of Subgroups. Proceedings of the First European Symposium on Principles of KDD, eds. J. Komorowski and J. Zytkow. Berlin: Springer.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 1998 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Klösgen, W. (1998). Deviation and Association Patterns for Subgroup Mining in Temporal, Spatial, and Textual Data Bases. In: Polkowski, L., Skowron, A. (eds) Rough Sets and Current Trends in Computing. RSCTC 1998. Lecture Notes in Computer Science(), vol 1424. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-69115-4_1
Download citation
DOI: https://doi.org/10.1007/3-540-69115-4_1
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-64655-6
Online ISBN: 978-3-540-69115-0
eBook Packages: Springer Book Archive