Abstract
We introduce the problem of diverse dimension decomposition in transactional databases, where a dimension is a set of mutually exclusive itemsets. The problem we consider requires to find a decomposition of the itemset space into dimensions, which are orthogonal to each other and which provide high coverage of the input database. The mining framework we propose can be interpreted as a dimensionality-reducing transformation from the space of all items to the space of orthogonal dimensions. Relying on information-theoretic concepts, we formulate the diverse dimension decomposition problem with a single objective function that simultaneously captures constraints on coverage, exclusivity, and orthogonality. We show that our problem is NP-hard, and we propose a greedy algorithm exploiting the well-known FP-tree data structure. Our algorithm is equipped with strategies for pruning the search space deriving directly from the objective function. We also prove a property that allows assessing the level of informativeness for newly added dimensions, thus allowing to define criteria for terminating the decomposition. We demonstrate the effectiveness of our solution by experimental evaluation on synthetic datasets with known dimension and three real-world datasets, flickr, del.icio.us and dblp. The problem we study is largely motivated by applications in the domain of collaborative tagging; however, the mining task we introduce in this paper is useful in other application domains as well.
Similar content being viewed by others
References
van Zwol R, Sigurbjörnsson B, Adapala R, Pueyo LG, Katiyar A, Kurapati K, Muralidharan M, Muthu S, Murdock V, Ng P, Ramani A, Sahai A, Sathish ST, Vasudev H, Vuyyuru U (2010) Faceted exploration of image search results. In: WWW
Sigurbjörnsson B, van Zwol R (2008) Flickr tag recommendation based on collective knowledge. In: WWW
Grahl M, Hotho A, Stumme G (2007) Conceptual clustering of social bookmarking sites. In: LWA 2007: Lernen—Wissen—Adaption
Ramage D, Heymann P, Manning CD, Garcia-Molina H (2009) Clustering the tagged web. In: WSDM 2009: Proceedings of the 2nd ACM international Conference on web search and data mining
van Leeuwen M, Bonchi F, Sigurbjörnsson B, Siebes A (2009) compressing tags to find interesting media groups. In: CIKM
Morik K, Kaspari A, Wurst M, Skirzynski M (2012) Multi-objective frequent termset clustering. In: Knowledge and information systems (KAIS). Springer, Berlin, vol 30, pp 715–738
Knobbe AJ, Ho EKY (2006) Maximally informative k-itemsets and their efficient discovery. In: Eliassi-Rad T, Ungar LH, Craven M, Gunopulos D (eds) KDD. ACM, London, pp 237–244
Knobbe AJ, Ho EKY (2006) Pattern teams. In: Fürnkranz J, Scheffer T, Spiliopoulou M (eds) Ser. Lecture Notes in Computer Science. Springer, Berlin, vol 4213, pp 577–584
Tatti N (2010) Probably the best itemsets. In: Rao B, Krishnapuram B, Tomkins A, Yang Q (eds) KDD. ACM, New York, pp 293–302
Michael Mampaey JV, Tatti Nikolaj (2011) Tell me what i need to know: succinctly summarizing data with itemsets. In: KDD
Heikinheimo H, Hinkkanen E, Mannila H, Mielikäinen T, Seppänen JK (2007) Finding low-entropy sets and trees from binary data. In: KDD
Zhang C, Masseglia F (2010) Discovering highly informative feature sets from data streams. In: DEXA
Han J, Pei J, Yin Y (2010) Mining frequent patterns without candidate generation. In: ACM SIGMOD conference, pp 1–12
Tsytsarau M, Bonchi F, Gionis A, Palpanas T (2011) Diverse dimension decomposition of an itemsets space. In: ICDM
Bonchi F, Castillo C, Donato D, Gionis A (2008) Topical query decomposition. In: KDD
Carterette B, Chandar P (2009) Probabilistic models of ranking novel documents for faceted topic retrieval. In: CIKM
Santos RL, Macdonald C, Ounis I (2010) Exploiting query reformulations for web search result diversification. In: WWW
Capannini G, Nardini FM, Perego R, Silvestri F (2011) Efficient diversification of web search results. PVLDB 4(7): 451–459
Korn F, Labrinidis A, Kotidis Y, Faloutsos C (2000) Quantifiable data mining using ratio rules. VLDB J 8(3–4): 254–266
Golub GH, Van Loan CF (1996) Matrix computations, 3rd ed. The Johns Hopkins University Press, Baltimore
Verhein F, Chawla S (2006) Geometrically inspired itemset mining. In: ICDM, pp 655–666
Cover TM, Thomas JA (1991) Elements of information theory. Wiley, New York
Tatti N (2008) Maximum entropy based significance of itemsets. Knowledge and information systems (KAIS). Springer, Berlin, vol 17, pp 57–77
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Tsytsarau, M., Bonchi, F., Gionis, A. et al. Diverse dimension decomposition for itemset spaces. Knowl Inf Syst 33, 447–473 (2012). https://doi.org/10.1007/s10115-012-0518-5
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10115-012-0518-5