Abstract
We present an information-theoretic framework for mining dependencies between itemsets in binary data. The problem of closure-based redundancy in this context is theoretically investigated, and we present both lossless and lossy pruning techniques. An efficient and scalable algorithm is proposed, which exploits the inclusion-exclusion principle for fast entropy computation. This algorithm is empirically evaluated through experiments on synthetic and real-world data.
An extended version of this paper is available as a technical report [1].
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Mampaey, M.: Mining non-redundant information-theoretic dependencies between itemsets. Technical Report, University of Antwerp (2010)
Agrawal, R., Imielinski, T., Swami, A.: Mining association rules between sets of items in large databases. ACM SIGMOD Record 22(2), 207–216 (1993)
Han, J., Pei, J., Yin, Y.: Mining frequent patterns without candidate generation. ACM SIGMOD Record 29(2), 1–12 (2000)
Zaki, M., Parthasarathy, S., Ogihara, M., Li, W., et al.: New algorithms for fast discovery of association rules. In: Proceedings of KDD (1997)
Srikant, R., Agrawal, R.: Mining quantitative association rules in large relational tables. ACM SIGMOD Record 25(2), 1–12 (1996)
Kivinen, J., Mannila, H.: Approximate inference of functional dependencies from relations. Theoretical Computer Science 149(1), 129–149 (1995)
Huhtala, Y., Karkkainen, J., Porkka, P., Toivonen, H.: TANE: An efficient algorithm for discovering functional and approximate dependencies. The Computer Journal 42(2), 100–111 (1999)
Zaki, M.J.: Generating non-redundant association rules. In: Proceedings of KDD, pp. 34–43 (2000)
Balcázar, J.L.: Minimum-size bases of association rules. In: Proceedings of ECML PKDD, pp. 86–101 (2008)
Dalkilic, M.M., Robertson, E.L.: Information dependencies. In: Proceedings of ACM PODS, pp. 245–253 (2000)
Heikinheimo, H., Hinkkanen, E., Mannila, H., Mielikäinen, T., Seppänen, J.K.: Finding low-entropy sets and trees from binary data. In: Proceedings of KDD, pp. 350–359 (2007)
Jaroszewicz, S., Simovici, D.A.: Pruning redundant association rules using maximum entropy principle. In: Chen, M.-S., Yu, P.S., Liu, B. (eds.) PAKDD 2002. LNCS (LNAI), vol. 2336, pp. 135–147. Springer, Heidelberg (2002)
Shannon, C.E.: A mathematical theory of communication. Bell System Technical Journal 27, 379–423 (1948)
Bayardo Jr., R.: Efficiently mining long patterns from databases. In: Proceedings of ACM SIGMOD, pp. 85–93 (1998)
Gouda, K., Zaki, M.: Efficiently mining maximal frequent itemsets. In: Proceedings of IEEE ICDM, pp. 163–170 (2001)
Pasquier, N., Bastide, Y., Taouil, R., Lakhal, L.: Discovering frequent closed itemsets for association rules. In: Proceedings of ICDT, pp. 398–416 (1999)
Calders, T., Goethals, B.: Non-derivable itemset mining. Data Mining and Knowledge Discovery 14(1), 171–206 (2007)
Calders, T., Goethals, B.: Quick inclusion-exclusion. In: Bonchi, F., Boulicaut, J.-F. (eds.) KDID 2005. LNCS, vol. 3933, pp. 86–103. Springer, Heidelberg (2006)
Goethals, B.: Frequent itemset mining implementations repository, http://fimi.cs.helsinki.fi/data
Huhtala, Y., Karkkainen, J., Porkka, P., Toivonen, H.: TANE homepage, http://www.cs.helsinki.fi/research/fdk/datamining/tane
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2010 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Mampaey, M. (2010). Mining Non-redundant Information-Theoretic Dependencies between Itemsets. In: Bach Pedersen, T., Mohania, M.K., Tjoa, A.M. (eds) Data Warehousing and Knowledge Discovery. DaWaK 2010. Lecture Notes in Computer Science, vol 6263. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-15105-7_11
Download citation
DOI: https://doi.org/10.1007/978-3-642-15105-7_11
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-15104-0
Online ISBN: 978-3-642-15105-7
eBook Packages: Computer ScienceComputer Science (R0)