Multi-objective frequent termset clustering

Morik, Katharina; Kaspari, Andreas; Wurst, Michael; Skirzynski, Marcin

doi:10.1007/s10115-011-0431-3

Multi-objective frequent termset clustering

Regular Paper
Published: 19 July 2011

Volume 30, pages 715–738, (2012)
Cite this article

Knowledge and Information Systems Aims and scope Submit manuscript

Katharina Morik¹,
Andreas Kaspari²,
Michael Wurst³ &
…
Marcin Skirzynski¹

255 Accesses
10 Citations
4 Altmetric
Explore all metrics

Abstract

Large media collections rapidly evolve in the World Wide Web. In addition to the targeted retrieval as is performed by search engines, browsing and explorative navigation is an important issue. Since the collections grow fast and authors most often do not annotate their web pages according to a given ontology, automatic structuring is in demand as a prerequisite for any pleasant human–computer interface. In this paper, we investigate the problem of finding alternative high-quality structures for navigation in a large collection of high-dimensional data. We express desired properties of frequent termset clustering (FTS) in terms of objective functions. In general, these functions are conflicting. This leads to the formulation of FTS clustering as a multi-objective optimization problem. The optimization is solved by a genetic algorithm. The result is a set of Pareto-optimal solutions. Users may choose their favorite type of a structure for their navigation through a collection or explore the different views given by the different optimal solutions. We explore the capability of the new approach to produce structures that are well suited for browsing on a social bookmarking data set.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

An MDL-Based Frequent Itemset Hierarchical Clustering Technique to Improve Query Search Results of an Individual Search Engine

Algorithm for Clustering of Web Search Results from a Hyper-heuristic Approach

An effective approach for semantic-based clustering and topic-based ranking of web documents

Article 15 March 2018

References

Agrawal R, Srikant R (1994) Fast algorithms for mining association rules. In: Proceedings of the international conference on very large data bases
Begelman G, Keller P, Smadja F (2006) Automated tag clustering: improving search and exploration in the tag space. In: Collaborative web tagging workshop
Beil F, Ester M, Xu X (2002) Frequent term-based text clustering. In: Proceedings of the international conference on knowledge discovery and data mining (KDD)
Benz D, Hotho A, Jäschke R, Krause B, Mitzlaff F, Schmitz C, Stumme G (2010) The social bookmark and publication management system bibsonomy—a platform for evaluating and demonstrating web 2.0 research. VLDB J 19(6): 849–875
Article Google Scholar
Bockermann C, Jungermann F (2010) Stream-based community discovery via relational hypergraph factorization on evolving networks. In: Proceedings of the workshop on dynamic networks and knowledge discovery (DyNaK 2010) at ECML PKDD
Coello Coello CA (1999) A comprehensive survey of evolutionary-based multiobjective optimization techniques. Knowl Inf Syst 1(3): 129–156
Google Scholar
Deb K, Agrawal S, Pratab A, Meyarivan T (2000) A fast elitist non-dominated sorting genetic algorithm for multi-objective optimization: NSGA-II. In: Proceedings of the parallel problem solving from nature conference
Fung BCM, Wang K, Ester M (2003) Hierarchical document clustering using frequent itemsets. In: Proceedings of the SIAM international conference on data mining
Golder S, Huberman BA (2006) Usage patterns of collaborative tagging systems. J Inf Sci 32(2): 198–208
Article Google Scholar
Han J, Pei J, Yin Y (2000) Mining frequent patterns without candidate generation. In: Proceedings of the ACM SIGMOD international conference on management of data
Hassan-Montero Y, Herrero-Solana V (2006) Improving tag-clouds as visual information retrieval interfaces. In: Proceedings of the international conference on multidisciplinary information sciences and technologies
Hotho A, Jäschke R, Schmitz C, Stumme G (2006) BibSonomy: a social bookmark and publication sharing system. In: Proceedings of the conceptual structures tool interoperability workshop at the international conference on conceptual structures
Kaser O, Lemire D (2007) Tag-cloud drawing: algorithms for cloud visualization. In: WWW workshop on tagging and metadata for social information organization
Kobayashi M, Aono M (2006) Exploring overlapping clusters using dynamic re-scaling and sampling. Knowl Inf Syst 10(3): 295–313
Article Google Scholar
Körner C, Benz D, Hotho A, Strohmaier M, Stumme G (2010) Stop thinking, start tagging: tag semantics emerge from collaborative verbosity. In: Rappa M, Jones P, Freire J, Chakrabarti S (eds) Proceedings of the 19th international conference on world wide web, WWW 2010. ACM, NY, pp 521–530
Chapter Google Scholar
Liu Y, Li Z, Xiong H, Gao X, Wu J (2010) Understanding and enhancement of internal clustering validation measures. In: Proceedings of IEEE international conference on data mining
Mierswa I, Wurst M, Klinkenberg R, Scholz M, Euler T (2010) Yale: Rapid prototyping for complex data mining tasks. In: Ungar L, Craven M, Gunopulos D, Eliassi-Rad T (eds) KDD ’06: Proceedings of the 12th ACM SIGKDD international conference on knowledge discovery and data mining. ACM, NY, pp 935–940
Google Scholar
Salton G, McGill MJ (1986) Introduction to modern information retrieval. McGraw-Hill, NY
Google Scholar
Schmitz C, Hotho A, Jäschke R, Stumme G (2006) Mining association rules in folksonomies. In: Proceedings of the IFCS conference
Tatti N (2008) Maximum entropy based significance of itemsets. Knowl Inf Syst 17(1): 57–77
Article Google Scholar
Wang K, Xu C, Liu B (1999) Clustering transactions using large items. In: Proceedings of the international conference on information and knowledge management
Zhao Y, Karypis G (2004) Empirical and theoretical comparisons of selected criterion functions for document clustering. Mach Learn J 55: 311–331
Article MATH Google Scholar
Zitzler E, Thiele L (1999) Multiobjective evolutionary algorithms: a comparative case study and the strength Pareto approach. IEEE Trans Evol Comput 3(4): 257–271
Article Google Scholar

Download references

Author information

Authors and Affiliations

Technical University Dortmund, Computer Science VIII, Dortmund, Germany
Katharina Morik & Marcin Skirzynski
Duisburg, Germany
Andreas Kaspari
Smarter Cities Technology Center, IBM Dublin, Dublin, Ireland
Michael Wurst

Authors

Katharina Morik
View author publications
You can also search for this author in PubMed Google Scholar
Andreas Kaspari
View author publications
You can also search for this author in PubMed Google Scholar
Michael Wurst
View author publications
You can also search for this author in PubMed Google Scholar
Marcin Skirzynski
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Katharina Morik.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Morik, K., Kaspari, A., Wurst, M. et al. Multi-objective frequent termset clustering. Knowl Inf Syst 30, 715–738 (2012). https://doi.org/10.1007/s10115-011-0431-3

Download citation

Received: 13 October 2008
Revised: 08 April 2011
Accepted: 25 June 2011
Published: 19 July 2011
Issue Date: March 2012
DOI: https://doi.org/10.1007/s10115-011-0431-3

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Multi-objective frequent termset clustering

Abstract

Access this article

Similar content being viewed by others

An MDL-Based Frequent Itemset Hierarchical Clustering Technique to Improve Query Search Results of an Individual Search Engine

Algorithm for Clustering of Web Search Results from a Hyper-heuristic Approach

An effective approach for semantic-based clustering and topic-based ranking of web documents

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Multi-objective frequent termset clustering

Abstract

Access this article

Similar content being viewed by others

An MDL-Based Frequent Itemset Hierarchical Clustering Technique to Improve Query Search Results of an Individual Search Engine

Algorithm for Clustering of Web Search Results from a Hyper-heuristic Approach

An effective approach for semantic-based clustering and topic-based ranking of web documents

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation