A transduction-based approach to fuzzy clustering, relevance ranking and cluster label generation on web search results

Matsumoto, Takazumi; Hung, Edward

doi:10.1007/s10844-011-0161-8

A transduction-based approach to fuzzy clustering, relevance ranking and cluster label generation on web search results

Published: 11 May 2011

Volume 38, pages 419–448, (2012)
Cite this article

Journal of Intelligent Information Systems Aims and scope Submit manuscript

Takazumi Matsumoto¹ &
Edward Hung¹

224 Accesses
3 Citations
Explore all metrics

Abstract

This paper details a modular, self-contained web search results clustering system that enhances search results by (i) performing clustering on lists of web documents returned by queries to search engines, and (ii) ranking the results and labeling the resulting clusters, by using a calculated relevance value as a degree of membership to clusters. In addition, we demonstrate an external evaluation method based on precision for comparing fuzzy clustering techniques, as well as internal measures suitable for working on non-training data. The built-in label generator uses the membership degrees and relevance values to weight the most relevant results more heavily. The membership degrees of documents to fuzzy clusters also facilitate effective detection and removal of overly similar clusters. To achieve this, our transduction-based clustering algorithm (TCA) and its fuzzy counterpart (FTCA) employ a transduction-based relevance model (TRM) to consider local relationships between each web document. Results from testing on five different real-world and synthetic datasets results show favorable results compared to established label-based clustering algorithms Suffix Tree Clustering (STC) and Lingo.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Implementation of Web Search Result Clustering System

Employing query disambiguation using clustering techniques

Article 11 July 2019

Andreas Kanavos, Panagiota Kotoula, … Lazaros Iliadis

An effective approach for semantic-based clustering and topic-based ranking of web documents

Article 15 March 2018

Rajendra Kumar Roul

References

Carpineto, C., Osinski, S., Romano, G., & Weiss, D. (2009). A survey of web clustering engines. ACM Computing Surveys, 41, 17:1–17:36.
Google Scholar
Carpineto, C., & Romano, G. (2008). Odp239 test collection. Accessed Sept 2010. http://credo.fub.it/ambient.
Crabtree, D., Gao, X., & Andreae, P. (2005). Standardized evaluation method for web clustering results. In Proceedings of the 2005 IEEE/WIC/ACM international conference on web intelligence.
Cutting, D. R., Karger, D. R., Pedersen, J. O., & Tukey, J. W. (1992). Scatter/gather: A cluster-based approach to browsing large document collections. In Proceedings of the 15th annual international SIGIR conference on research and development in information retrieval.
Demartini, G., Chirita, P. A., Brunkhorst, I., & Nejdl, W. (2008). Ranking categories for web search. In Lecture notes in computer science 4956 (pp. 564–569). Springer.
Ferragina, P., & Gulli, A. (2005). A personalized search egnine based on web-snippet hierarchical clustering. In Special interest track and posters of the 14th international conference on World Wide Web.
Geraci, F., Pellegrini, M., Maggini, M., & Sebastiani, F. (2006). Cluster generation and cluster labelling for web snippets. In Lecture notes in computer science 4209 (pp. 25–36). Springer.
Hammouda, K., & Karray, F. (2000). A comparative study of data clustering techniques. Ph.D. thesis, University of Waterloo.
Hearst, M. A. (2006). Clustering versus faceted categories for information exploration. Communications of the ACM, 49(4), 59–61.
Article Google Scholar
Janruang, J., & Kreesuradej, W. (2006). A new web search result clustering based on true common phrase label discovery. In Proceedings of the international conference on computational inteligence for modelling control and automation and international conference on intelligent agents web technologies and international commerce.
Kaki, M. (2005). Findex: Search result categories help users when document ranking fails. In Proceedings of the SIGCHI conference on human factors in computing systems.
Lewandowski, D. (2008). Search engine user behaviour: How can users be guided to quality content? Information Services and Use, 28(3/4), 261–268.
Google Scholar
Lewis, D. D. (2004). (2004). Reuters-21578 text categorization test collection. Accessed Sept 2010. http://www.daviddlewis.com/resources/testcollections/reuters21578/.
Matsumoto, T., & Hung, E. (2010). Fuzzy clustering and relevance ranking of web search results with differentiating cluster label generation. In 2010 IEEE world conference on computational intelligence.
Mecca, G., Raunich, S., & Pappalardo, A. (2007). A new algorithm for clustering search results. Data and Knowledge Engineering, 62, 504–522.
Article Google Scholar
Mendes, M. E. S., & Stacks, L. (2003). Evaluating fuzzy clustering for relevance-based information access. In Proceedings of the IEEE international conference on fuzzy systems.
Osinski, S., & Weiss, D. (2005). A concept-driven algorithm for clustering search results. IEEE Intelligent Systems, 20(3), 48–54.
Article Google Scholar
Stefanowski, J., & Weiss, D. (2003). Carrot² and language properties in web search results clustering. In Lecture notes in computer science 2663 (pp. 240–249). Springer.
van Rijsbergen, C. J. (1979). Information retrieval. Butterworth
Xiao, L., & Hung, E. (2008). Clustering web-search results using transduction-based relevance model. In IEEE 1st pacific-asia workshop on web mining and web-based application 2008.
Yager, R. R., & Filev, D. P. (1994). Approximate clustering via the mountain method. IEEE Transactions on Systems, Man and Cybernetics, 24, 1279–1284.
Article Google Scholar
Zamir, O., & Etzioni, O. (1998). Web document clustering: A feasibility demonstration. In Proceedings of the 21st annual international sigir conference on research and development in information retrieval.
Zhang, G., Liu, Y., Tan, S., & Cheng, X. (2007). A novel method for hierarchical clustering of search results. In Proceedings of the 2007 IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology—Workshops.
Zhu, X., & Ghahramani, Z. (2002). Learning from labeled and unlabeled data with label propagation. Tech. rep., Carnegie Mellon University.

Download references

Author information

Authors and Affiliations

Department of Computing, Hong Kong Polytechnic University, Hung Hom, Hong Kong
Takazumi Matsumoto & Edward Hung

Authors

Takazumi Matsumoto
View author publications
You can also search for this author in PubMed Google Scholar
Edward Hung
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Takazumi Matsumoto.

Additional information

The work described in this paper was partially supported by grants from the Research Grants Council of the Hong Kong Special Administrative Region, China (Project No. PolyU 5191/09E, PolyU 5182/08E, PolyU 5174/07E, PolyU 5181/06E, G-U524).

Rights and permissions

Reprints and permissions

About this article

Cite this article

Matsumoto, T., Hung, E. A transduction-based approach to fuzzy clustering, relevance ranking and cluster label generation on web search results. J Intell Inf Syst 38, 419–448 (2012). https://doi.org/10.1007/s10844-011-0161-8

Download citation

Received: 12 October 2010
Revised: 15 February 2011
Accepted: 19 April 2011
Published: 11 May 2011
Issue Date: April 2012
DOI: https://doi.org/10.1007/s10844-011-0161-8

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A transduction-based approach to fuzzy clustering, relevance ranking and cluster label generation on web search results

Abstract

Access this article

Similar content being viewed by others

Implementation of Web Search Result Clustering System

Employing query disambiguation using clustering techniques

An effective approach for semantic-based clustering and topic-based ranking of web documents

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Keywords

Navigation

A transduction-based approach to fuzzy clustering, relevance ranking and cluster label generation on web search results

Abstract

Access this article

Similar content being viewed by others

Implementation of Web Search Result Clustering System

Employing query disambiguation using clustering techniques

An effective approach for semantic-based clustering and topic-based ranking of web documents

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation