Skip to main content

Pruning Redundant Association Rules Using Maximum Entropy Principle

  • Conference paper
  • First Online:

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 2336))

Abstract

Data mining algorithms produce huge sets of rules, practically impossible to analyze manually. It is thus important to develop methods for removing redundant rules from those sets. We present a solution to the problem using the Maximum Entropy approach. The problem of efficiency of Maximum Entropy computations is addressed by using closed form solutions for the most frequent cases. Analytical and experimental evaluation of the proposed technique indicates that it efficiently produces small sets of interesting association rules.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Ratnaparkhi Adwait. A simple introduction to maximum entropy models for natural language processing. IRCS Report 97-08, University of Pennsylvania, 3401 Walnut Street, Suite 400A, Philadelphia, PA, May 1997. ftp://www.cis.upenn.edu/pub/ircs/tr/97-08.ps.Z.

    Google Scholar 

  2. R. Agrawal, T. Imielinski, and A. Swami. Mining association rules between sets of items in large databases. In Proc. ACM SIGMOD Conference on Management of Data, pages 207–216, Washington, D.C., 1993.

    Google Scholar 

  3. Y. Aumann and Y. Lindell. A statistical theory for quantitative association rules. In Knowledge Discovery and Data Mining, pages 261–270, 1999.

    Google Scholar 

  4. A. H. Andersen. Multidimensional contigency tables. Scand. J. Statist, 1:115–127, 1974.

    MathSciNet  MATH  Google Scholar 

  5. R. J. Bayardo and R. Agrawal. Mining the most interesting rules. In Proc. of the 5th ACM SIGKDD Int’l Conf. on Knowledge Discovery and Data Mining, pages 145–154, August 1999.

    Google Scholar 

  6. J. Badsberg. An Environment for Graphical Models. PhD thesis, Aalborg University, 1995.

    Google Scholar 

  7. C.L. Blake and C.J. Merz. UCI repository of machine learning databases. University of California, Irvine, Dept. of Information and Computer Sciences, 1998. http://www.ics.uci.edu/~mlearn/MLRepository.html.

  8. T. Brijs, K. Vanhoof, and G. Wets. Reducing redundancy in characteristic rule discovery by using integer programming techniques. Intelligent Data Analysis Journal, 4(3), 2000.

    Google Scholar 

  9. I. Csiszar. A geometric interpretation of Darroch and Ratcliff’s generalized iterative scaling. The Annals of Staticstics, 17(3):1409–1413, 1989.

    Article  MathSciNet  Google Scholar 

  10. J. N. Darroch, S. L. Lauritzen, and T. P. Speed. Markov fields and log-linear interaction models for contingency tables. Annals of Statistics, 8:522–539, 1980.

    Article  MATH  MathSciNet  Google Scholar 

  11. J. N. Darroch and D. Ratcliff. Generalized iterative scaling for log-linear models. Annals of Mathematical Statistics, 43:1470–1480, 1972.

    Article  MathSciNet  MATH  Google Scholar 

  12. A. J. Grove, J. Y. Halpern, and D. Koller. Random worlds and maximum entropy. Journal of Artificial Intelligence Research, 2:33–88, 1994.

    Article  MATH  MathSciNet  Google Scholar 

  13. R. Hilderman and H. Hamilton. Knowledge discovery and interestingness measures: A survey. Technical Report CS 99-04, Department of Computer Science, University of Regina, 1999.

    Google Scholar 

  14. S. Jaroszewicz and D. A. Simovici. A general measure of rule interestingness. In Proc of PKDD 2001, Freiburg, Germany, volume 2168 of Lecture Notes in Computer Science, pages 253–265. Springer, September 2001.

    Google Scholar 

  15. J. N. Kapur and H. K. Kesavan. Entropy Optimization Principles with Applications. Academic Press, San Diego, 1992.

    Google Scholar 

  16. Bing Liu, Wynne Hsu, and Yiming Ma. Pruning and summarizing the discovered associations. In Surajit Chaudhuri and David Madigan, editors, Proceedings of the Fifth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 125–134, N.Y., August 15–18 1999. ACM Press.

    Google Scholar 

  17. D. Pavlov, H. Mannila, and P. Smyth. Beyond independence: Probabilistic models for query approximation on binary transaction data. Technical Report ICS TR-01-09, Information and Computer Science Department, UC Irvine, 2001.

    Google Scholar 

  18. B. Padmanabhan and A. Tuzhilin. Small is beautiful: discovering the minimal set of unexpected patterns. In Raghu Ramakrishnan, Sal Stolfo, Roberto Bayardo, and Ismail Parsa, editors, Proceedinmgs of the 6th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD-00), pages 54–63, N. Y., August 2000. ACM Press.

    Google Scholar 

  19. Adwait Ratnaparkhi. A maximum entropy model for part-of-speech tagging. In Eric Brill and Kenneth Church, editors, Proceedings of the Conference on Empirical Methods in Natural Language Processing, pages 133–142. Association for Computational Linguistics, Somerset, New Jersey, 1996.

    Google Scholar 

  20. E. Suzuki and Y. Kodratoff. Discovery of surprising exception rules based on intensity of implication. In Proc of PKDD-98, Nantes, France, pages 10–18, 1998.

    Google Scholar 

  21. D. Shah, L. V. S. Lakshmanan, K. Ramamritham, and S. Sudarshan. Interestingness and pruning of mined patterns. In 1999 ACM SIGMOD Workshop on Research Issues in Data Mining and Knowledge Discovery, 1999.

    Google Scholar 

  22. E. Suzuki. Autonomous discovery of reliable exception rules. In David Heckerman, Heikki Mannila, Daryl Pregibon, and Ramasamy Uthurusamy, editors, Proceedings of the Third International Conference on Knowledge Discovery and Data Mining (KDD-97), page 259. AAAI Press, 1997.

    Google Scholar 

  23. H. Toivonen, M. Klemettinen, P. Ronkainen, K. Hätönen, and H. Mannila. Pruning and grouping discovered association rules. In MLnet Workshop on Statistics, Machine Learning, and Discovery in Databases, pages 47–52, Heraklion, Crete, Greece, April 1995.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2002 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Jaroszewicz, S., Simovici, D.A. (2002). Pruning Redundant Association Rules Using Maximum Entropy Principle. In: Chen, MS., Yu, P.S., Liu, B. (eds) Advances in Knowledge Discovery and Data Mining. PAKDD 2002. Lecture Notes in Computer Science(), vol 2336. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-47887-6_13

Download citation

  • DOI: https://doi.org/10.1007/3-540-47887-6_13

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-43704-8

  • Online ISBN: 978-3-540-47887-4

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics