Skip to main content
Log in

Diverse dimension decomposition for itemset spaces

  • Regular paper
  • Published:
Knowledge and Information Systems Aims and scope Submit manuscript

Abstract

We introduce the problem of diverse dimension decomposition in transactional databases, where a dimension is a set of mutually exclusive itemsets. The problem we consider requires to find a decomposition of the itemset space into dimensions, which are orthogonal to each other and which provide high coverage of the input database. The mining framework we propose can be interpreted as a dimensionality-reducing transformation from the space of all items to the space of orthogonal dimensions. Relying on information-theoretic concepts, we formulate the diverse dimension decomposition problem with a single objective function that simultaneously captures constraints on coverage, exclusivity, and orthogonality. We show that our problem is NP-hard, and we propose a greedy algorithm exploiting the well-known FP-tree data structure. Our algorithm is equipped with strategies for pruning the search space deriving directly from the objective function. We also prove a property that allows assessing the level of informativeness for newly added dimensions, thus allowing to define criteria for terminating the decomposition. We demonstrate the effectiveness of our solution by experimental evaluation on synthetic datasets with known dimension and three real-world datasets, flickr, del.icio.us and dblp. The problem we study is largely motivated by applications in the domain of collaborative tagging; however, the mining task we introduce in this paper is useful in other application domains as well.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. van Zwol R, Sigurbjörnsson B, Adapala R, Pueyo LG, Katiyar A, Kurapati K, Muralidharan M, Muthu S, Murdock V, Ng P, Ramani A, Sahai A, Sathish ST, Vasudev H, Vuyyuru U (2010) Faceted exploration of image search results. In: WWW

  2. Sigurbjörnsson B, van Zwol R (2008) Flickr tag recommendation based on collective knowledge. In: WWW

  3. Grahl M, Hotho A, Stumme G (2007) Conceptual clustering of social bookmarking sites. In: LWA 2007: Lernen—Wissen—Adaption

  4. Ramage D, Heymann P, Manning CD, Garcia-Molina H (2009) Clustering the tagged web. In: WSDM 2009: Proceedings of the 2nd ACM international Conference on web search and data mining

  5. van Leeuwen M, Bonchi F, Sigurbjörnsson B, Siebes A (2009) compressing tags to find interesting media groups. In: CIKM

  6. Morik K, Kaspari A, Wurst M, Skirzynski M (2012) Multi-objective frequent termset clustering. In: Knowledge and information systems (KAIS). Springer, Berlin, vol 30, pp 715–738

  7. Knobbe AJ, Ho EKY (2006) Maximally informative k-itemsets and their efficient discovery. In: Eliassi-Rad T, Ungar LH, Craven M, Gunopulos D (eds) KDD. ACM, London, pp 237–244

  8. Knobbe AJ, Ho EKY (2006) Pattern teams. In: Fürnkranz J, Scheffer T, Spiliopoulou M (eds) Ser. Lecture Notes in Computer Science. Springer, Berlin, vol 4213, pp 577–584

  9. Tatti N (2010) Probably the best itemsets. In: Rao B, Krishnapuram B, Tomkins A, Yang Q (eds) KDD. ACM, New York, pp 293–302

  10. Michael Mampaey JV, Tatti Nikolaj (2011) Tell me what i need to know: succinctly summarizing data with itemsets. In: KDD

  11. Heikinheimo H, Hinkkanen E, Mannila H, Mielikäinen T, Seppänen JK (2007) Finding low-entropy sets and trees from binary data. In: KDD

  12. Zhang C, Masseglia F (2010) Discovering highly informative feature sets from data streams. In: DEXA

  13. Han J, Pei J, Yin Y (2010) Mining frequent patterns without candidate generation. In: ACM SIGMOD conference, pp 1–12

  14. Tsytsarau M, Bonchi F, Gionis A, Palpanas T (2011) Diverse dimension decomposition of an itemsets space. In: ICDM

  15. Bonchi F, Castillo C, Donato D, Gionis A (2008) Topical query decomposition. In: KDD

  16. Carterette B, Chandar P (2009) Probabilistic models of ranking novel documents for faceted topic retrieval. In: CIKM

  17. Santos RL, Macdonald C, Ounis I (2010) Exploiting query reformulations for web search result diversification. In: WWW

  18. Capannini G, Nardini FM, Perego R, Silvestri F (2011) Efficient diversification of web search results. PVLDB 4(7): 451–459

    Google Scholar 

  19. Korn F, Labrinidis A, Kotidis Y, Faloutsos C (2000) Quantifiable data mining using ratio rules. VLDB J 8(3–4): 254–266

    Article  Google Scholar 

  20. Golub GH, Van Loan CF (1996) Matrix computations, 3rd ed. The Johns Hopkins University Press, Baltimore

    MATH  Google Scholar 

  21. Verhein F, Chawla S (2006) Geometrically inspired itemset mining. In: ICDM, pp 655–666

  22. Cover TM, Thomas JA (1991) Elements of information theory. Wiley, New York

    Book  MATH  Google Scholar 

  23. Tatti N (2008) Maximum entropy based significance of itemsets. Knowledge and information systems (KAIS). Springer, Berlin, vol 17, pp 57–77

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Mikalai Tsytsarau.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Tsytsarau, M., Bonchi, F., Gionis, A. et al. Diverse dimension decomposition for itemset spaces. Knowl Inf Syst 33, 447–473 (2012). https://doi.org/10.1007/s10115-012-0518-5

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10115-012-0518-5

Keywords

Navigation