Abstract
Finding all interesting patterns in a database is a data mining task that typically requires a complete search through the hypothesis space. Several ILP systems address this task, e.g., [Deh98],[Wro97],[FL01]. Safe pruning techniques that reduce the size of the hypothesis space without the risk of missing interesting patterns are very important for this task. This paper is concerned with the effectiveness of pruning techniques in this setting. The addressed pruning techniques are (1) optimum estimates, (2) a pruning technique based on subset tests that is derived from the Apriori search algorithm, (3) pruning based on taxonomies, and (4) to consider only most general patterns as interesting. Methods (1) to (3) are safe pruning techniques that find all interesting patterns; method (4) reduces the number of accepted patterns. The effect of these pruning methods is investigated by experiments within a range of different specific task settings and two databases.
Experimental results indicate that optimum estimates and Apriori-style pruning are effective and reliable pruning techniques that produce little additional cost. The effect of taxonomies for pruning is smaller, and it varies over different task settings. In the experiments, the restriction to most general patterns considerably reduces the search costs as well as the set of accepted patterns.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
R. Agrawal, H. Mannila, R. Srikant, H Toivonen, and I. Verkamo. Fast discovery of association rules. In U. Fayyad, G. Piatetsky-Shapiro, P. Smyth, and R. Uthurusamy, editors, Advances in Knowledge Discovery and Data Mining. MIT Press, Cambridge, MA, 1996.
Luc Dehaspe. Frequent pattern discovery in first-order logic. PhD thesis, K.U.Leuven, December 1998.
L. Fleury, C. Djeraba, J. Philippe, and H. Briand. Contribution of the implication intensity in rules evaluations for knowledge discovery in databases. In Y. Kodrato., G. Nakhaeizadeh, and C. Taylor, editors, Workshop Notes of the ECML-95 Workshop Statistics, Machine Learning and Knowledge Discovery in Databases, 1995.
P. Flach and N. Lachiche. Confirmation-guided discovery of first-order rules with Tertius. Machine Learning, 42(1/2):61–95, 2001.
R. Gras and A. Larher. L’implication statistique, une nouvelle méthode d’analyse de données. Mathématique, Informatique et Sciences Humaines, (120), 1993.
S. Rapp. Automatic labeling of German prosody. In Proc. of Int. Conference on Spoken Language Processing (ICSLP’98), 1998.
I. Weber. Level-wise search and pruning strategies for first-order hypothesis spaces. Journal of Intelligent Information Systems, 14(2–3):217–239, 2000.
Stefan Wrobel. An algorithm for multi-relational discovery of subgroups. In J. Komorowski and J. Zytkow, editors, Proc. First European Symposium on Principles of Knowledge Discovery and Data Mining. Springer, 1997.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2003 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Weber, I. (2003). Experimental Investigation of Pruning Methods for Relational Pattern Discovery. In: Matwin, S., Sammut, C. (eds) Inductive Logic Programming. ILP 2002. Lecture Notes in Computer Science(), vol 2583. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-36468-4_20
Download citation
DOI: https://doi.org/10.1007/3-540-36468-4_20
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-00567-4
Online ISBN: 978-3-540-36468-9
eBook Packages: Springer Book Archive