Skip to main content

Synonyms Extraction Using Web Content Focused Crawling

  • Conference paper
Information Retrieval Technology (AIRS 2008)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 4993))

Included in the following conference series:

  • 1398 Accesses

Abstract

Documents or Web pages collected from the World Wide Web have been considered one of the most important sources for information. Using search engines to retrieve the documents can harvest lots of information, facilitating information exchange and knowledge sharing, including foreign information. However, to better understand by local readers, foreign words, like English, are often translated to local language such as Chinese. Due to different translators and the lack of translation standard, translating foreign words may pose a notorious headache and result in different transliterations, particularly in proper nouns like person names and geographical names. For example, Bin Laden is translated into terms “賓拉登” (binladeng) or “本拉登” (benladeng). Both are valid synonymous transliterations. In this research, we propose an approach to determining synonymous transliterations via mining Web pages retrieved by a search engine. Experiments show that the proposed approach can effectively extract synonymous transliterations given an input transliteration.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Netcraft, How many Web sites are there? http://www.boutell.com/newfaq/misc/sizeofWeb.html

  2. Han, J., Kamber, M.: Data Mining Concepts and Techniques. Morgan Kaufmann, San Francisco (2001)

    Google Scholar 

  3. Qin, J., Zhou, Y., Chau, M.: Building Domain-Specific Web Collections for Scientific Digital Libraries: A Meta-Search Enhanced Focused Crawling Method. In: Proceedings of the 2004 Joint ACM/IEEE Conference on digital Libraries, pp. 135–141 (2004)

    Google Scholar 

  4. Oyama, S., Kokubo, T., Ishida, T.: Domain-Specific We Search with Keyword Spices. IEEE Transactions on Knowledge and Data Engineering 16(1), 17–27 (2004)

    Article  Google Scholar 

  5. Cheng, P.J., Teng, J.W., Chen, R.C., Wang, J.H., Lu, W.H., Chien, L.F.: Cross-Language Information Retrieval: Translating Unknown Queries with Web Corpora for Cross-Language Information Retrieval. In: Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval, pp. 146–153 (2004)

    Google Scholar 

  6. Gao, J., Zhang, J., Zhou, M.: On the use of Words and N-grams for Chinese Information Retrieval. In: Proceedings of the fifth International Workshop on Information Retrieval with Asian Languages, Beijing, China, pp. 141–148 (2000)

    Google Scholar 

  7. Chang, T.H., Lee, C.H.: Automatic Chinese Unknown Word Extraction using Small-Corps-based Method. In: Proceedings of the 2003 International Conference on Natural Language Processing and Knowledge Engineering, pp. 459–464 (2003)

    Google Scholar 

  8. Hsu, C.C., Chen, C.H., Shih, T.T., Chen, C.K.: Measuring similarity between transliterations against noise data. ACM Transactions on Asian Language Information Processing 6(1) (2007)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Hang Li Ting Liu Wei-Ying Ma Tetsuya Sakai Kam-Fai Wong Guodong Zhou

Rights and permissions

Reprints and permissions

Copyright information

© 2008 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Chen, CH., Hsu, CC. (2008). Synonyms Extraction Using Web Content Focused Crawling. In: Li, H., Liu, T., Ma, WY., Sakai, T., Wong, KF., Zhou, G. (eds) Information Retrieval Technology. AIRS 2008. Lecture Notes in Computer Science, vol 4993. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-68636-1_28

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-68636-1_28

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-68633-0

  • Online ISBN: 978-3-540-68636-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics