Skip to main content

Advertisement

Log in

Multi-objective frequent termset clustering

  • Regular Paper
  • Published:
Knowledge and Information Systems Aims and scope Submit manuscript

Abstract

Large media collections rapidly evolve in the World Wide Web. In addition to the targeted retrieval as is performed by search engines, browsing and explorative navigation is an important issue. Since the collections grow fast and authors most often do not annotate their web pages according to a given ontology, automatic structuring is in demand as a prerequisite for any pleasant human–computer interface. In this paper, we investigate the problem of finding alternative high-quality structures for navigation in a large collection of high-dimensional data. We express desired properties of frequent termset clustering (FTS) in terms of objective functions. In general, these functions are conflicting. This leads to the formulation of FTS clustering as a multi-objective optimization problem. The optimization is solved by a genetic algorithm. The result is a set of Pareto-optimal solutions. Users may choose their favorite type of a structure for their navigation through a collection or explore the different views given by the different optimal solutions. We explore the capability of the new approach to produce structures that are well suited for browsing on a social bookmarking data set.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Agrawal R, Srikant R (1994) Fast algorithms for mining association rules. In: Proceedings of the international conference on very large data bases

  2. Begelman G, Keller P, Smadja F (2006) Automated tag clustering: improving search and exploration in the tag space. In: Collaborative web tagging workshop

  3. Beil F, Ester M, Xu X (2002) Frequent term-based text clustering. In: Proceedings of the international conference on knowledge discovery and data mining (KDD)

  4. Benz D, Hotho A, Jäschke R, Krause B, Mitzlaff F, Schmitz C, Stumme G (2010) The social bookmark and publication management system bibsonomy—a platform for evaluating and demonstrating web 2.0 research. VLDB J 19(6): 849–875

    Article  Google Scholar 

  5. Bockermann C, Jungermann F (2010) Stream-based community discovery via relational hypergraph factorization on evolving networks. In: Proceedings of the workshop on dynamic networks and knowledge discovery (DyNaK 2010) at ECML PKDD

  6. Coello Coello CA (1999) A comprehensive survey of evolutionary-based multiobjective optimization techniques. Knowl Inf Syst 1(3): 129–156

    Google Scholar 

  7. Deb K, Agrawal S, Pratab A, Meyarivan T (2000) A fast elitist non-dominated sorting genetic algorithm for multi-objective optimization: NSGA-II. In: Proceedings of the parallel problem solving from nature conference

  8. Fung BCM, Wang K, Ester M (2003) Hierarchical document clustering using frequent itemsets. In: Proceedings of the SIAM international conference on data mining

  9. Golder S, Huberman BA (2006) Usage patterns of collaborative tagging systems. J Inf Sci 32(2): 198–208

    Article  Google Scholar 

  10. Han J, Pei J, Yin Y (2000) Mining frequent patterns without candidate generation. In: Proceedings of the ACM SIGMOD international conference on management of data

  11. Hassan-Montero Y, Herrero-Solana V (2006) Improving tag-clouds as visual information retrieval interfaces. In: Proceedings of the international conference on multidisciplinary information sciences and technologies

  12. Hotho A, Jäschke R, Schmitz C, Stumme G (2006) BibSonomy: a social bookmark and publication sharing system. In: Proceedings of the conceptual structures tool interoperability workshop at the international conference on conceptual structures

  13. Kaser O, Lemire D (2007) Tag-cloud drawing: algorithms for cloud visualization. In: WWW workshop on tagging and metadata for social information organization

  14. Kobayashi M, Aono M (2006) Exploring overlapping clusters using dynamic re-scaling and sampling. Knowl Inf Syst 10(3): 295–313

    Article  Google Scholar 

  15. Körner C, Benz D, Hotho A, Strohmaier M, Stumme G (2010) Stop thinking, start tagging: tag semantics emerge from collaborative verbosity. In: Rappa M, Jones P, Freire J, Chakrabarti S (eds) Proceedings of the 19th international conference on world wide web, WWW 2010. ACM, NY, pp 521–530

    Chapter  Google Scholar 

  16. Liu Y, Li Z, Xiong H, Gao X, Wu J (2010) Understanding and enhancement of internal clustering validation measures. In: Proceedings of IEEE international conference on data mining

  17. Mierswa I, Wurst M, Klinkenberg R, Scholz M, Euler T (2010) Yale: Rapid prototyping for complex data mining tasks. In: Ungar L, Craven M, Gunopulos D, Eliassi-Rad T (eds) KDD ’06: Proceedings of the 12th ACM SIGKDD international conference on knowledge discovery and data mining. ACM, NY, pp 935–940

    Google Scholar 

  18. Salton G, McGill MJ (1986) Introduction to modern information retrieval. McGraw-Hill, NY

    Google Scholar 

  19. Schmitz C, Hotho A, Jäschke R, Stumme G (2006) Mining association rules in folksonomies. In: Proceedings of the IFCS conference

  20. Tatti N (2008) Maximum entropy based significance of itemsets. Knowl Inf Syst 17(1): 57–77

    Article  Google Scholar 

  21. Wang K, Xu C, Liu B (1999) Clustering transactions using large items. In: Proceedings of the international conference on information and knowledge management

  22. Zhao Y, Karypis G (2004) Empirical and theoretical comparisons of selected criterion functions for document clustering. Mach Learn J 55: 311–331

    Article  MATH  Google Scholar 

  23. Zitzler E, Thiele L (1999) Multiobjective evolutionary algorithms: a comparative case study and the strength Pareto approach. IEEE Trans Evol Comput 3(4): 257–271

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Katharina Morik.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Morik, K., Kaspari, A., Wurst, M. et al. Multi-objective frequent termset clustering. Knowl Inf Syst 30, 715–738 (2012). https://doi.org/10.1007/s10115-011-0431-3

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10115-011-0431-3

Keywords

Navigation