Abstract
Large media collections rapidly evolve in the World Wide Web. In addition to the targeted retrieval as is performed by search engines, browsing and explorative navigation is an important issue. Since the collections grow fast and authors most often do not annotate their web pages according to a given ontology, automatic structuring is in demand as a prerequisite for any pleasant human–computer interface. In this paper, we investigate the problem of finding alternative high-quality structures for navigation in a large collection of high-dimensional data. We express desired properties of frequent termset clustering (FTS) in terms of objective functions. In general, these functions are conflicting. This leads to the formulation of FTS clustering as a multi-objective optimization problem. The optimization is solved by a genetic algorithm. The result is a set of Pareto-optimal solutions. Users may choose their favorite type of a structure for their navigation through a collection or explore the different views given by the different optimal solutions. We explore the capability of the new approach to produce structures that are well suited for browsing on a social bookmarking data set.
Similar content being viewed by others
References
Agrawal R, Srikant R (1994) Fast algorithms for mining association rules. In: Proceedings of the international conference on very large data bases
Begelman G, Keller P, Smadja F (2006) Automated tag clustering: improving search and exploration in the tag space. In: Collaborative web tagging workshop
Beil F, Ester M, Xu X (2002) Frequent term-based text clustering. In: Proceedings of the international conference on knowledge discovery and data mining (KDD)
Benz D, Hotho A, Jäschke R, Krause B, Mitzlaff F, Schmitz C, Stumme G (2010) The social bookmark and publication management system bibsonomy—a platform for evaluating and demonstrating web 2.0 research. VLDB J 19(6): 849–875
Bockermann C, Jungermann F (2010) Stream-based community discovery via relational hypergraph factorization on evolving networks. In: Proceedings of the workshop on dynamic networks and knowledge discovery (DyNaK 2010) at ECML PKDD
Coello Coello CA (1999) A comprehensive survey of evolutionary-based multiobjective optimization techniques. Knowl Inf Syst 1(3): 129–156
Deb K, Agrawal S, Pratab A, Meyarivan T (2000) A fast elitist non-dominated sorting genetic algorithm for multi-objective optimization: NSGA-II. In: Proceedings of the parallel problem solving from nature conference
Fung BCM, Wang K, Ester M (2003) Hierarchical document clustering using frequent itemsets. In: Proceedings of the SIAM international conference on data mining
Golder S, Huberman BA (2006) Usage patterns of collaborative tagging systems. J Inf Sci 32(2): 198–208
Han J, Pei J, Yin Y (2000) Mining frequent patterns without candidate generation. In: Proceedings of the ACM SIGMOD international conference on management of data
Hassan-Montero Y, Herrero-Solana V (2006) Improving tag-clouds as visual information retrieval interfaces. In: Proceedings of the international conference on multidisciplinary information sciences and technologies
Hotho A, Jäschke R, Schmitz C, Stumme G (2006) BibSonomy: a social bookmark and publication sharing system. In: Proceedings of the conceptual structures tool interoperability workshop at the international conference on conceptual structures
Kaser O, Lemire D (2007) Tag-cloud drawing: algorithms for cloud visualization. In: WWW workshop on tagging and metadata for social information organization
Kobayashi M, Aono M (2006) Exploring overlapping clusters using dynamic re-scaling and sampling. Knowl Inf Syst 10(3): 295–313
Körner C, Benz D, Hotho A, Strohmaier M, Stumme G (2010) Stop thinking, start tagging: tag semantics emerge from collaborative verbosity. In: Rappa M, Jones P, Freire J, Chakrabarti S (eds) Proceedings of the 19th international conference on world wide web, WWW 2010. ACM, NY, pp 521–530
Liu Y, Li Z, Xiong H, Gao X, Wu J (2010) Understanding and enhancement of internal clustering validation measures. In: Proceedings of IEEE international conference on data mining
Mierswa I, Wurst M, Klinkenberg R, Scholz M, Euler T (2010) Yale: Rapid prototyping for complex data mining tasks. In: Ungar L, Craven M, Gunopulos D, Eliassi-Rad T (eds) KDD ’06: Proceedings of the 12th ACM SIGKDD international conference on knowledge discovery and data mining. ACM, NY, pp 935–940
Salton G, McGill MJ (1986) Introduction to modern information retrieval. McGraw-Hill, NY
Schmitz C, Hotho A, Jäschke R, Stumme G (2006) Mining association rules in folksonomies. In: Proceedings of the IFCS conference
Tatti N (2008) Maximum entropy based significance of itemsets. Knowl Inf Syst 17(1): 57–77
Wang K, Xu C, Liu B (1999) Clustering transactions using large items. In: Proceedings of the international conference on information and knowledge management
Zhao Y, Karypis G (2004) Empirical and theoretical comparisons of selected criterion functions for document clustering. Mach Learn J 55: 311–331
Zitzler E, Thiele L (1999) Multiobjective evolutionary algorithms: a comparative case study and the strength Pareto approach. IEEE Trans Evol Comput 3(4): 257–271
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Morik, K., Kaspari, A., Wurst, M. et al. Multi-objective frequent termset clustering. Knowl Inf Syst 30, 715–738 (2012). https://doi.org/10.1007/s10115-011-0431-3
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10115-011-0431-3