Skip to main content

Web-Based Terminology Translation Mining

  • Conference paper
Natural Language Processing – IJCNLP 2005 (IJCNLP 2005)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 3651))

Included in the following conference series:

Abstract

Mining terminology translation from a large amount of Web data can be applied in many fields such as reading/writing assistant, machine translation and cross-language information retrieval. How to find more comprehensive results from the Web and obtain the boundary of candidate translations, and how to remove irrelevant noises and rank the remained candidates are the challenging issues. In this paper, after reviewing and analyzing all possible methods of acquiring translations, a feasible statistics-based method is proposed to mine terminology translation from the Web. In the proposed method, on the basis of an analysis of different forms of term translation distributions, character-based string frequency estimation is presented to construct term translation candidates for exploring more translations and their boundaries, and then sort-based subset deletion and mutual information methods are respectively proposed to deal with subset redundancy information and prefix/suffix redundancy information formed in the process of estimation. Extensive experiments on two test sets of 401 and 3511 English terms validate that our system has better performance.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 129.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Somers, H.: Bilingual Parallel Corpora and Language Engineering. In: Proc. Anglo-Indian Workshop Language Engineering for South-Asian languages (2001)

    Google Scholar 

  2. Véronis, J.: Parallel Text Processing - Alignment and Use of Translation Corpora. Kluwer Academic Publishers, The Netherlands (2000)

    MATH  Google Scholar 

  3. Grefenstette, G.: The WWW as a Resource for Example-Based MT Tasks. In: Proc. ASLIB Translating and the Computer 21 Conference (1999)

    Google Scholar 

  4. Cao, Y., Li, H.: Base Noun Phrase Translation Using Web Data and the EM Algorithm. In: Proc. 19th Int’l Conf. Computational Linguistics, pp. 127–133 (2002)

    Google Scholar 

  5. Li, H., Cao, Y., Li, C.: Using Bilingual Web Data to Mine and Rank Translations. IEEE Intelligent Systems 4, 54–59 (2003)

    Google Scholar 

  6. Navigli, R., Velardi, P., Gangemi, A.: Ontology Learning and Its Application to Automated Terminology Translation. IEEE Intelligent Systems 1, 22–31 (2003)

    Google Scholar 

  7. Nagata, M., Saito, T., Suzuki, K.: Using the Web as a Bilingual Dictionary. In: Proc. ACL 2001 Workshop Data-Driven Methods in Machine Translation, pp. 95–102 (2001)

    Google Scholar 

  8. Rapp, R.: Identifying Word Translations in Nonparallel Texts. In: Proc. 33th Annual Meeting of the Association for Computational Linguistics, pp. 320–322 (1995)

    Google Scholar 

  9. Tanaka, K., Iwasaki, H.: Extraction of Lexical Translation from Non-Aligned Corpora. In: Proc. 16th Int’l Conf. Computational Linguistics, pp. 580–585 (1996)

    Google Scholar 

  10. Rapp, R.: Automatic Identification of Word Translations from Unrelated English and German Corpora. In: Proc. 37th Annual Meeting Assoc. Computational Linguistics, pp. 519–526 (1999)

    Google Scholar 

  11. Fung, P.: Compiling Bilingual Lexicon Entries from a Non-Parallel English-Chinese Corpus. In: Proc. Third Annual Workshop on Very Large Corpora, pp. 173–183 (1995)

    Google Scholar 

  12. Fung, P.: Finding Terminology Translations from Nonparallel Corpora. In: Proc. Fifth Annual Workshop on Very Large Corpora (WVLC 1997), pp. 192–202 (1997)

    Google Scholar 

  13. Fung, P., Yee, L.P.: An IR Approach for Translation New Words from Nonparallel, Comparable Texts. In: Proc. 17th Int’l Conf. Computational Linguistics and 36th Annual Meeting of the Association for Computational Linguistics, pp. 414–420 (1998)

    Google Scholar 

  14. Shahzad, I., Ohtake, K., Masuyama, S., Yamamoto, K.: Identifying Translations of Compound Nouns Using Non-Aligned Corpora. In: Proc. Workshop on Multilingual Information Processing and Asian Language Processing, pp. 108–113 (1999)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2005 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Fang, G., Yu, H., Nishino, F. (2005). Web-Based Terminology Translation Mining. In: Dale, R., Wong, KF., Su, J., Kwong, O.Y. (eds) Natural Language Processing – IJCNLP 2005. IJCNLP 2005. Lecture Notes in Computer Science(), vol 3651. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11562214_87

Download citation

  • DOI: https://doi.org/10.1007/11562214_87

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-29172-5

  • Online ISBN: 978-3-540-31724-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics