Mining Non-redundant Information-Theoretic Dependencies between Itemsets

Mampaey, Michael

doi:10.1007/978-3-642-15105-7_11

Michael Mampaey¹⁹

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 6263))

Included in the following conference series:

International Conference on Data Warehousing and Knowledge Discovery

986 Accesses
3 Citations

Abstract

We present an information-theoretic framework for mining dependencies between itemsets in binary data. The problem of closure-based redundancy in this context is theoretically investigated, and we present both lossless and lossy pruning techniques. An efficient and scalable algorithm is proposed, which exploits the inclusion-exclusion principle for fast entropy computation. This algorithm is empirically evaluated through experiments on synthetic and real-world data.

An extended version of this paper is available as a technical report [1].

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Mampaey, M.: Mining non-redundant information-theoretic dependencies between itemsets. Technical Report, University of Antwerp (2010)
Google Scholar
Agrawal, R., Imielinski, T., Swami, A.: Mining association rules between sets of items in large databases. ACM SIGMOD Record 22(2), 207–216 (1993)
Article Google Scholar
Han, J., Pei, J., Yin, Y.: Mining frequent patterns without candidate generation. ACM SIGMOD Record 29(2), 1–12 (2000)
Article Google Scholar
Zaki, M., Parthasarathy, S., Ogihara, M., Li, W., et al.: New algorithms for fast discovery of association rules. In: Proceedings of KDD (1997)
Google Scholar
Srikant, R., Agrawal, R.: Mining quantitative association rules in large relational tables. ACM SIGMOD Record 25(2), 1–12 (1996)
Article Google Scholar
Kivinen, J., Mannila, H.: Approximate inference of functional dependencies from relations. Theoretical Computer Science 149(1), 129–149 (1995)
Article MathSciNet MATH Google Scholar
Huhtala, Y., Karkkainen, J., Porkka, P., Toivonen, H.: TANE: An efficient algorithm for discovering functional and approximate dependencies. The Computer Journal 42(2), 100–111 (1999)
Article MATH Google Scholar
Zaki, M.J.: Generating non-redundant association rules. In: Proceedings of KDD, pp. 34–43 (2000)
Google Scholar
Balcázar, J.L.: Minimum-size bases of association rules. In: Proceedings of ECML PKDD, pp. 86–101 (2008)
Google Scholar
Dalkilic, M.M., Robertson, E.L.: Information dependencies. In: Proceedings of ACM PODS, pp. 245–253 (2000)
Google Scholar
Heikinheimo, H., Hinkkanen, E., Mannila, H., Mielikäinen, T., Seppänen, J.K.: Finding low-entropy sets and trees from binary data. In: Proceedings of KDD, pp. 350–359 (2007)
Google Scholar
Jaroszewicz, S., Simovici, D.A.: Pruning redundant association rules using maximum entropy principle. In: Chen, M.-S., Yu, P.S., Liu, B. (eds.) PAKDD 2002. LNCS (LNAI), vol. 2336, pp. 135–147. Springer, Heidelberg (2002)
Chapter Google Scholar
Shannon, C.E.: A mathematical theory of communication. Bell System Technical Journal 27, 379–423 (1948)
Article MathSciNet MATH Google Scholar
Bayardo Jr., R.: Efficiently mining long patterns from databases. In: Proceedings of ACM SIGMOD, pp. 85–93 (1998)
Google Scholar
Gouda, K., Zaki, M.: Efficiently mining maximal frequent itemsets. In: Proceedings of IEEE ICDM, pp. 163–170 (2001)
Google Scholar
Pasquier, N., Bastide, Y., Taouil, R., Lakhal, L.: Discovering frequent closed itemsets for association rules. In: Proceedings of ICDT, pp. 398–416 (1999)
Google Scholar
Calders, T., Goethals, B.: Non-derivable itemset mining. Data Mining and Knowledge Discovery 14(1), 171–206 (2007)
Article MathSciNet Google Scholar
Calders, T., Goethals, B.: Quick inclusion-exclusion. In: Bonchi, F., Boulicaut, J.-F. (eds.) KDID 2005. LNCS, vol. 3933, pp. 86–103. Springer, Heidelberg (2006)
Chapter Google Scholar
Goethals, B.: Frequent itemset mining implementations repository, http://fimi.cs.helsinki.fi/data
Huhtala, Y., Karkkainen, J., Porkka, P., Toivonen, H.: TANE homepage, http://www.cs.helsinki.fi/research/fdk/datamining/tane

Download references

Author information

Authors and Affiliations

Dept. of Mathematics and Computer Science, University of Antwerp, Belgium
Michael Mampaey

Authors

Michael Mampaey
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Computer Science, Aalborg University, Selma Lagerløfs Vej 300, 9220, Aalborg, Denmark
Torben Bach Pedersen
IBM India Research Lab, 4, Block C, Institutional Area, Vasant Kunj, 110 070, New Delhi, India
Mukesh K. Mohania
Institute of Software Technology, Vienna University of Technology, Favoritenstr. 9-11/188, 1040, Vienna, Austria
A Min Tjoa

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Mampaey, M. (2010). Mining Non-redundant Information-Theoretic Dependencies between Itemsets. In: Bach Pedersen, T., Mohania, M.K., Tjoa, A.M. (eds) Data Warehousing and Knowledge Discovery. DaWaK 2010. Lecture Notes in Computer Science, vol 6263. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-15105-7_11

Download citation

DOI: https://doi.org/10.1007/978-3-642-15105-7_11
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-15104-0
Online ISBN: 978-3-642-15105-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics