Web Snippet Clustering Based on Text Enrichment with Concept Hierarchy

Jinarat, Supakpong; Haruechaiyasak, Choochart; Rungsawang, Arnon

doi:10.1007/978-3-642-10684-2_34

Supakpong Jinarat¹⁹,
Choochart Haruechaiyasak²⁰ &
Arnon Rungsawang¹⁹

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 5864))

Included in the following conference series:

International Conference on Neural Information Processing

1697 Accesses
1 Citations

Abstract

Clustering web snippet results returned from search engine helps facilitate browsing and navigating for users. Due to the extremely short length of web snippets, many traditional clustering techniques which adopt the bag of words model often yields unsatisfactory clustering results. In this paper, we propose a method of text enrichment for improving performance of web snippet clustering. The main idea is to expand the original snippets with some related conceptual terms. We apply the Open Directory Project (ODP), a web taxonomy organized by humans, to provide the concept hierarchy of the web contents. Using a test data set of 240 queries, we performed the experiments by using two clustering techniques: K-means clustering as the non-overlapping approach and the Suffix Tree Clustering (STC) as the overlapping approach. Using the proposed text enrichment method, the K-means clustering yielded the overall performance improvement up to 15.51% based on the F1 measure. On the other hand, the Suffix Tree Clustering with text enrichment helped improve the performance up to 53.71%.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 129.00; Price excludes VAT (USA)

Softcover Book: USD 169.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Ngo, C.L., Nguyen, H.S.: A method of web search result clustering based on rough sets. In: Proceeding of the 2005 IEEE/WIC/ACM, International Conference on Web Intelligence, pp. 673–679 (2005)
Google Scholar
Zamir, O., Etzioni, O.: Web document clustering: A feasibility demonstration. In: Proceedings of SIGIR 1998, 21st ACM International Conference on Research and Development in Information Retrieval, Melbourne, AU, pp. 46–54 (1998)
Google Scholar
Osinski, S.: Improving Quality of Search Results Clustering with Approximate Matrix Factorizations. In: Lalmas, M., MacFarlane, A., Rüger, S.M., Tombros, A., Tsikrika, T., Yavlinsky, A. (eds.) ECIR 2006. LNCS, vol. 3936, pp. 167–178. Springer, Heidelberg (2006)
Chapter Google Scholar
Osinski, S., Stefanowski, J., Weiss, D.: Lingo: Search Result Clustering Algorithm Based on Singular Value Decomposition. In: Proceedings of International IIS: Intelligent Information Processing and Web Mining Conference, Zakopane, Poland, pp. 359–368 (2004)
Google Scholar
Hotho, A., Staab, S., Stumme, G.: Wordnet improves Text Document Clustering. In: Proceeding of the SIGIR 2003 Semantic Web Workshop, pp. 541–544 (2003)
Google Scholar
Crabtree, D., Gao, X., Andreae, P.: Improving Web Clustering by Cluster Selection. In: Proceeding of the 2005 IEEE/WIC/ACM, International Conference on Web Intelligence, pp. 172–178 (2005)
Google Scholar
Zang, D., Dong, Y.: Semantic, Online, Hierarchical Clustering of Web Search Results. In: Yu, J.X., Lin, X., Lu, H., Zhang, Y. (eds.) APWeb 2004. LNCS, vol. 3007, pp. 69–78. Springer, Heidelberg (2004)
Google Scholar
Miller, G.: Wordnet: A lexical database for English. CACM 38(11), 39–41 (1995)
Google Scholar

Download references

Author information

Authors and Affiliations

Massive Information & Knowledge Engineering, Department of Computer Engineering, Faculty of Engineering, Kasetsart University, Bangkok, 10900, Thailand
Supakpong Jinarat & Arnon Rungsawang
Human Language Technology Laboratory (HLT), National Electronics and Computer Technology Center (NECTEC), Thailand Science Park, Klong Luang, Pathumthani, 12120, Thailand
Choochart Haruechaiyasak

Authors

Supakpong Jinarat
View author publications
You can also search for this author in PubMed Google Scholar
Choochart Haruechaiyasak
View author publications
You can also search for this author in PubMed Google Scholar
Arnon Rungsawang
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Electronic Engineering, City University of Hong Kong, Hong Kong,
Chi Sing Leung
School of Electrical Engineering and Computer Science, Kyungpook National University, 1370 Sankyuk-Dong, Puk-Gu, 702-701, Taegu, Korea
Minho Lee
School of Information Technology, King Mongkut’s University of Technology Thonburi, 126 Pracha-U-Thit Rd., Bangmod, Thungkru, 10140, Bangkok, Thailand
Jonathan H. Chan

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Jinarat, S., Haruechaiyasak, C., Rungsawang, A. (2009). Web Snippet Clustering Based on Text Enrichment with Concept Hierarchy. In: Leung, C.S., Lee, M., Chan, J.H. (eds) Neural Information Processing. ICONIP 2009. Lecture Notes in Computer Science, vol 5864. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-10684-2_34

Download citation

DOI: https://doi.org/10.1007/978-3-642-10684-2_34
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-10682-8
Online ISBN: 978-3-642-10684-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics