Abstract
Data mining algorithms produce huge sets of rules, practically impossible to analyze manually. It is thus important to develop methods for removing redundant rules from those sets. We present a solution to the problem using the Maximum Entropy approach. The problem of efficiency of Maximum Entropy computations is addressed by using closed form solutions for the most frequent cases. Analytical and experimental evaluation of the proposed technique indicates that it efficiently produces small sets of interesting association rules.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Ratnaparkhi Adwait. A simple introduction to maximum entropy models for natural language processing. IRCS Report 97-08, University of Pennsylvania, 3401 Walnut Street, Suite 400A, Philadelphia, PA, May 1997. ftp://www.cis.upenn.edu/pub/ircs/tr/97-08.ps.Z.
R. Agrawal, T. Imielinski, and A. Swami. Mining association rules between sets of items in large databases. In Proc. ACM SIGMOD Conference on Management of Data, pages 207–216, Washington, D.C., 1993.
Y. Aumann and Y. Lindell. A statistical theory for quantitative association rules. In Knowledge Discovery and Data Mining, pages 261–270, 1999.
A. H. Andersen. Multidimensional contigency tables. Scand. J. Statist, 1:115–127, 1974.
R. J. Bayardo and R. Agrawal. Mining the most interesting rules. In Proc. of the 5th ACM SIGKDD Int’l Conf. on Knowledge Discovery and Data Mining, pages 145–154, August 1999.
J. Badsberg. An Environment for Graphical Models. PhD thesis, Aalborg University, 1995.
C.L. Blake and C.J. Merz. UCI repository of machine learning databases. University of California, Irvine, Dept. of Information and Computer Sciences, 1998. http://www.ics.uci.edu/~mlearn/MLRepository.html.
T. Brijs, K. Vanhoof, and G. Wets. Reducing redundancy in characteristic rule discovery by using integer programming techniques. Intelligent Data Analysis Journal, 4(3), 2000.
I. Csiszar. A geometric interpretation of Darroch and Ratcliff’s generalized iterative scaling. The Annals of Staticstics, 17(3):1409–1413, 1989.
J. N. Darroch, S. L. Lauritzen, and T. P. Speed. Markov fields and log-linear interaction models for contingency tables. Annals of Statistics, 8:522–539, 1980.
J. N. Darroch and D. Ratcliff. Generalized iterative scaling for log-linear models. Annals of Mathematical Statistics, 43:1470–1480, 1972.
A. J. Grove, J. Y. Halpern, and D. Koller. Random worlds and maximum entropy. Journal of Artificial Intelligence Research, 2:33–88, 1994.
R. Hilderman and H. Hamilton. Knowledge discovery and interestingness measures: A survey. Technical Report CS 99-04, Department of Computer Science, University of Regina, 1999.
S. Jaroszewicz and D. A. Simovici. A general measure of rule interestingness. In Proc of PKDD 2001, Freiburg, Germany, volume 2168 of Lecture Notes in Computer Science, pages 253–265. Springer, September 2001.
J. N. Kapur and H. K. Kesavan. Entropy Optimization Principles with Applications. Academic Press, San Diego, 1992.
Bing Liu, Wynne Hsu, and Yiming Ma. Pruning and summarizing the discovered associations. In Surajit Chaudhuri and David Madigan, editors, Proceedings of the Fifth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 125–134, N.Y., August 15–18 1999. ACM Press.
D. Pavlov, H. Mannila, and P. Smyth. Beyond independence: Probabilistic models for query approximation on binary transaction data. Technical Report ICS TR-01-09, Information and Computer Science Department, UC Irvine, 2001.
B. Padmanabhan and A. Tuzhilin. Small is beautiful: discovering the minimal set of unexpected patterns. In Raghu Ramakrishnan, Sal Stolfo, Roberto Bayardo, and Ismail Parsa, editors, Proceedinmgs of the 6th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD-00), pages 54–63, N. Y., August 2000. ACM Press.
Adwait Ratnaparkhi. A maximum entropy model for part-of-speech tagging. In Eric Brill and Kenneth Church, editors, Proceedings of the Conference on Empirical Methods in Natural Language Processing, pages 133–142. Association for Computational Linguistics, Somerset, New Jersey, 1996.
E. Suzuki and Y. Kodratoff. Discovery of surprising exception rules based on intensity of implication. In Proc of PKDD-98, Nantes, France, pages 10–18, 1998.
D. Shah, L. V. S. Lakshmanan, K. Ramamritham, and S. Sudarshan. Interestingness and pruning of mined patterns. In 1999 ACM SIGMOD Workshop on Research Issues in Data Mining and Knowledge Discovery, 1999.
E. Suzuki. Autonomous discovery of reliable exception rules. In David Heckerman, Heikki Mannila, Daryl Pregibon, and Ramasamy Uthurusamy, editors, Proceedings of the Third International Conference on Knowledge Discovery and Data Mining (KDD-97), page 259. AAAI Press, 1997.
H. Toivonen, M. Klemettinen, P. Ronkainen, K. Hätönen, and H. Mannila. Pruning and grouping discovered association rules. In MLnet Workshop on Statistics, Machine Learning, and Discovery in Databases, pages 47–52, Heraklion, Crete, Greece, April 1995.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2002 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Jaroszewicz, S., Simovici, D.A. (2002). Pruning Redundant Association Rules Using Maximum Entropy Principle. In: Chen, MS., Yu, P.S., Liu, B. (eds) Advances in Knowledge Discovery and Data Mining. PAKDD 2002. Lecture Notes in Computer Science(), vol 2336. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-47887-6_13
Download citation
DOI: https://doi.org/10.1007/3-540-47887-6_13
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-43704-8
Online ISBN: 978-3-540-47887-4
eBook Packages: Springer Book Archive