Pruning Redundant Association Rules Using Maximum Entropy Principle

Jaroszewicz, Szymon; Simovici, Dan A.

doi:10.1007/3-540-47887-6_13

Pruning Redundant Association Rules Using Maximum Entropy Principle

Szymon Jaroszewicz⁴ &
Dan A. Simovici⁴

Conference paper
First Online: 01 January 2002

2189 Accesses
27 Citations

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 2336))

Abstract

Data mining algorithms produce huge sets of rules, practically impossible to analyze manually. It is thus important to develop methods for removing redundant rules from those sets. We present a solution to the problem using the Maximum Entropy approach. The problem of efficiency of Maximum Entropy computations is addressed by using closed form solutions for the most frequent cases. Analytical and experimental evaluation of the proposed technique indicates that it efficiently produces small sets of interesting association rules.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Ratnaparkhi Adwait. A simple introduction to maximum entropy models for natural language processing. IRCS Report 97-08, University of Pennsylvania, 3401 Walnut Street, Suite 400A, Philadelphia, PA, May 1997. ftp://www.cis.upenn.edu/pub/ircs/tr/97-08.ps.Z.
Google Scholar
R. Agrawal, T. Imielinski, and A. Swami. Mining association rules between sets of items in large databases. In Proc. ACM SIGMOD Conference on Management of Data, pages 207–216, Washington, D.C., 1993.
Google Scholar
Y. Aumann and Y. Lindell. A statistical theory for quantitative association rules. In Knowledge Discovery and Data Mining, pages 261–270, 1999.
Google Scholar
A. H. Andersen. Multidimensional contigency tables. Scand. J. Statist, 1:115–127, 1974.
MathSciNet MATH Google Scholar
R. J. Bayardo and R. Agrawal. Mining the most interesting rules. In Proc. of the 5th ACM SIGKDD Int’l Conf. on Knowledge Discovery and Data Mining, pages 145–154, August 1999.
Google Scholar
J. Badsberg. An Environment for Graphical Models. PhD thesis, Aalborg University, 1995.
Google Scholar
C.L. Blake and C.J. Merz. UCI repository of machine learning databases. University of California, Irvine, Dept. of Information and Computer Sciences, 1998. http://www.ics.uci.edu/~mlearn/MLRepository.html.
T. Brijs, K. Vanhoof, and G. Wets. Reducing redundancy in characteristic rule discovery by using integer programming techniques. Intelligent Data Analysis Journal, 4(3), 2000.
Google Scholar
I. Csiszar. A geometric interpretation of Darroch and Ratcliff’s generalized iterative scaling. The Annals of Staticstics, 17(3):1409–1413, 1989.
Article MathSciNet Google Scholar
J. N. Darroch, S. L. Lauritzen, and T. P. Speed. Markov fields and log-linear interaction models for contingency tables. Annals of Statistics, 8:522–539, 1980.
Article MATH MathSciNet Google Scholar
J. N. Darroch and D. Ratcliff. Generalized iterative scaling for log-linear models. Annals of Mathematical Statistics, 43:1470–1480, 1972.
Article MathSciNet MATH Google Scholar
A. J. Grove, J. Y. Halpern, and D. Koller. Random worlds and maximum entropy. Journal of Artificial Intelligence Research, 2:33–88, 1994.
Article MATH MathSciNet Google Scholar
R. Hilderman and H. Hamilton. Knowledge discovery and interestingness measures: A survey. Technical Report CS 99-04, Department of Computer Science, University of Regina, 1999.
Google Scholar
S. Jaroszewicz and D. A. Simovici. A general measure of rule interestingness. In Proc of PKDD 2001, Freiburg, Germany, volume 2168 of Lecture Notes in Computer Science, pages 253–265. Springer, September 2001.
Google Scholar
J. N. Kapur and H. K. Kesavan. Entropy Optimization Principles with Applications. Academic Press, San Diego, 1992.
Google Scholar
Bing Liu, Wynne Hsu, and Yiming Ma. Pruning and summarizing the discovered associations. In Surajit Chaudhuri and David Madigan, editors, Proceedings of the Fifth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 125–134, N.Y., August 15–18 1999. ACM Press.
Google Scholar
D. Pavlov, H. Mannila, and P. Smyth. Beyond independence: Probabilistic models for query approximation on binary transaction data. Technical Report ICS TR-01-09, Information and Computer Science Department, UC Irvine, 2001.
Google Scholar
B. Padmanabhan and A. Tuzhilin. Small is beautiful: discovering the minimal set of unexpected patterns. In Raghu Ramakrishnan, Sal Stolfo, Roberto Bayardo, and Ismail Parsa, editors, Proceedinmgs of the 6th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD-00), pages 54–63, N. Y., August 2000. ACM Press.
Google Scholar
Adwait Ratnaparkhi. A maximum entropy model for part-of-speech tagging. In Eric Brill and Kenneth Church, editors, Proceedings of the Conference on Empirical Methods in Natural Language Processing, pages 133–142. Association for Computational Linguistics, Somerset, New Jersey, 1996.
Google Scholar
E. Suzuki and Y. Kodratoff. Discovery of surprising exception rules based on intensity of implication. In Proc of PKDD-98, Nantes, France, pages 10–18, 1998.
Google Scholar
D. Shah, L. V. S. Lakshmanan, K. Ramamritham, and S. Sudarshan. Interestingness and pruning of mined patterns. In 1999 ACM SIGMOD Workshop on Research Issues in Data Mining and Knowledge Discovery, 1999.
Google Scholar
E. Suzuki. Autonomous discovery of reliable exception rules. In David Heckerman, Heikki Mannila, Daryl Pregibon, and Ramasamy Uthurusamy, editors, Proceedings of the Third International Conference on Knowledge Discovery and Data Mining (KDD-97), page 259. AAAI Press, 1997.
Google Scholar
H. Toivonen, M. Klemettinen, P. Ronkainen, K. Hätönen, and H. Mannila. Pruning and grouping discovered association rules. In MLnet Workshop on Statistics, Machine Learning, and Discovery in Databases, pages 47–52, Heraklion, Crete, Greece, April 1995.
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Mathematics and Computer Science, University of Massachusetts at Boston, Boston, Massachusetts, 02125, USA
Szymon Jaroszewicz & Dan A. Simovici

Authors

Szymon Jaroszewicz
View author publications
You can also search for this author in PubMed Google Scholar
Dan A. Simovici
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

EE Department, National Taiwan University, No. 1, Sec. 4, Roosevelt Road, Taipei, Taiwan, ROC
Ming-Syan Chen
IBM Thomas J. Watson Research Center, 30 Sawmill River Road, Hawthorne, NY, 10532, USA
Philip S. Yu
School of Computing, National University of Singapore, Lower Kent Ridge Road, Singapore, 119260
Bing Liu

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Jaroszewicz, S., Simovici, D.A. (2002). Pruning Redundant Association Rules Using Maximum Entropy Principle. In: Chen, MS., Yu, P.S., Liu, B. (eds) Advances in Knowledge Discovery and Data Mining. PAKDD 2002. Lecture Notes in Computer Science(), vol 2336. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-47887-6_13

Download citation

DOI: https://doi.org/10.1007/3-540-47887-6_13
Published: 29 April 2002
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-43704-8
Online ISBN: 978-3-540-47887-4
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics