Skip to main content

Web Snippet Clustering Based on Text Enrichment with Concept Hierarchy

  • Conference paper
Neural Information Processing (ICONIP 2009)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 5864))

Included in the following conference series:

Abstract

Clustering web snippet results returned from search engine helps facilitate browsing and navigating for users. Due to the extremely short length of web snippets, many traditional clustering techniques which adopt the bag of words model often yields unsatisfactory clustering results. In this paper, we propose a method of text enrichment for improving performance of web snippet clustering. The main idea is to expand the original snippets with some related conceptual terms. We apply the Open Directory Project (ODP), a web taxonomy organized by humans, to provide the concept hierarchy of the web contents. Using a test data set of 240 queries, we performed the experiments by using two clustering techniques: K-means clustering as the non-overlapping approach and the Suffix Tree Clustering (STC) as the overlapping approach. Using the proposed text enrichment method, the K-means clustering yielded the overall performance improvement up to 15.51% based on the F1 measure. On the other hand, the Suffix Tree Clustering with text enrichment helped improve the performance up to 53.71%.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 129.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Ngo, C.L., Nguyen, H.S.: A method of web search result clustering based on rough sets. In: Proceeding of the 2005 IEEE/WIC/ACM, International Conference on Web Intelligence, pp. 673–679 (2005)

    Google Scholar 

  2. Zamir, O., Etzioni, O.: Web document clustering: A feasibility demonstration. In: Proceedings of SIGIR 1998, 21st ACM International Conference on Research and Development in Information Retrieval, Melbourne, AU, pp. 46–54 (1998)

    Google Scholar 

  3. Osinski, S.: Improving Quality of Search Results Clustering with Approximate Matrix Factorizations. In: Lalmas, M., MacFarlane, A., Rüger, S.M., Tombros, A., Tsikrika, T., Yavlinsky, A. (eds.) ECIR 2006. LNCS, vol. 3936, pp. 167–178. Springer, Heidelberg (2006)

    Chapter  Google Scholar 

  4. Osinski, S., Stefanowski, J., Weiss, D.: Lingo: Search Result Clustering Algorithm Based on Singular Value Decomposition. In: Proceedings of International IIS: Intelligent Information Processing and Web Mining Conference, Zakopane, Poland, pp. 359–368 (2004)

    Google Scholar 

  5. Hotho, A., Staab, S., Stumme, G.: Wordnet improves Text Document Clustering. In: Proceeding of the SIGIR 2003 Semantic Web Workshop, pp. 541–544 (2003)

    Google Scholar 

  6. Crabtree, D., Gao, X., Andreae, P.: Improving Web Clustering by Cluster Selection. In: Proceeding of the 2005 IEEE/WIC/ACM, International Conference on Web Intelligence, pp. 172–178 (2005)

    Google Scholar 

  7. Zang, D., Dong, Y.: Semantic, Online, Hierarchical Clustering of Web Search Results. In: Yu, J.X., Lin, X., Lu, H., Zhang, Y. (eds.) APWeb 2004. LNCS, vol. 3007, pp. 69–78. Springer, Heidelberg (2004)

    Google Scholar 

  8. Miller, G.: Wordnet: A lexical database for English. CACM 38(11), 39–41 (1995)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2009 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Jinarat, S., Haruechaiyasak, C., Rungsawang, A. (2009). Web Snippet Clustering Based on Text Enrichment with Concept Hierarchy. In: Leung, C.S., Lee, M., Chan, J.H. (eds) Neural Information Processing. ICONIP 2009. Lecture Notes in Computer Science, vol 5864. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-10684-2_34

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-10684-2_34

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-10682-8

  • Online ISBN: 978-3-642-10684-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics