Abstract
Clustering web snippet results returned from search engine helps facilitate browsing and navigating for users. Due to the extremely short length of web snippets, many traditional clustering techniques which adopt the bag of words model often yields unsatisfactory clustering results. In this paper, we propose a method of text enrichment for improving performance of web snippet clustering. The main idea is to expand the original snippets with some related conceptual terms. We apply the Open Directory Project (ODP), a web taxonomy organized by humans, to provide the concept hierarchy of the web contents. Using a test data set of 240 queries, we performed the experiments by using two clustering techniques: K-means clustering as the non-overlapping approach and the Suffix Tree Clustering (STC) as the overlapping approach. Using the proposed text enrichment method, the K-means clustering yielded the overall performance improvement up to 15.51% based on the F1 measure. On the other hand, the Suffix Tree Clustering with text enrichment helped improve the performance up to 53.71%.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Ngo, C.L., Nguyen, H.S.: A method of web search result clustering based on rough sets. In: Proceeding of the 2005 IEEE/WIC/ACM, International Conference on Web Intelligence, pp. 673–679 (2005)
Zamir, O., Etzioni, O.: Web document clustering: A feasibility demonstration. In: Proceedings of SIGIR 1998, 21st ACM International Conference on Research and Development in Information Retrieval, Melbourne, AU, pp. 46–54 (1998)
Osinski, S.: Improving Quality of Search Results Clustering with Approximate Matrix Factorizations. In: Lalmas, M., MacFarlane, A., Rüger, S.M., Tombros, A., Tsikrika, T., Yavlinsky, A. (eds.) ECIR 2006. LNCS, vol. 3936, pp. 167–178. Springer, Heidelberg (2006)
Osinski, S., Stefanowski, J., Weiss, D.: Lingo: Search Result Clustering Algorithm Based on Singular Value Decomposition. In: Proceedings of International IIS: Intelligent Information Processing and Web Mining Conference, Zakopane, Poland, pp. 359–368 (2004)
Hotho, A., Staab, S., Stumme, G.: Wordnet improves Text Document Clustering. In: Proceeding of the SIGIR 2003 Semantic Web Workshop, pp. 541–544 (2003)
Crabtree, D., Gao, X., Andreae, P.: Improving Web Clustering by Cluster Selection. In: Proceeding of the 2005 IEEE/WIC/ACM, International Conference on Web Intelligence, pp. 172–178 (2005)
Zang, D., Dong, Y.: Semantic, Online, Hierarchical Clustering of Web Search Results. In: Yu, J.X., Lin, X., Lu, H., Zhang, Y. (eds.) APWeb 2004. LNCS, vol. 3007, pp. 69–78. Springer, Heidelberg (2004)
Miller, G.: Wordnet: A lexical database for English. CACM 38(11), 39–41 (1995)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2009 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Jinarat, S., Haruechaiyasak, C., Rungsawang, A. (2009). Web Snippet Clustering Based on Text Enrichment with Concept Hierarchy. In: Leung, C.S., Lee, M., Chan, J.H. (eds) Neural Information Processing. ICONIP 2009. Lecture Notes in Computer Science, vol 5864. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-10684-2_34
Download citation
DOI: https://doi.org/10.1007/978-3-642-10684-2_34
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-10682-8
Online ISBN: 978-3-642-10684-2
eBook Packages: Computer ScienceComputer Science (R0)