Web-Based Terminology Translation Mining

Fang, Gaolin; Yu, Hao; Nishino, Fumihito

doi:10.1007/11562214_87

Gaolin Fang²²,
Hao Yu²² &
Fumihito Nishino²²

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 3651))

Included in the following conference series:

International Conference on Natural Language Processing

1573 Accesses
1 Citations

Abstract

Mining terminology translation from a large amount of Web data can be applied in many fields such as reading/writing assistant, machine translation and cross-language information retrieval. How to find more comprehensive results from the Web and obtain the boundary of candidate translations, and how to remove irrelevant noises and rank the remained candidates are the challenging issues. In this paper, after reviewing and analyzing all possible methods of acquiring translations, a feasible statistics-based method is proposed to mine terminology translation from the Web. In the proposed method, on the basis of an analysis of different forms of term translation distributions, character-based string frequency estimation is presented to construct term translation candidates for exploring more translations and their boundaries, and then sort-based subset deletion and mutual information methods are respectively proposed to deal with subset redundancy information and prefix/suffix redundancy information formed in the process of estimation. Extensive experiments on two test sets of 401 and 3511 English terms validate that our system has better performance.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 129.00; Price excludes VAT (USA)

Softcover Book: USD 169.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Somers, H.: Bilingual Parallel Corpora and Language Engineering. In: Proc. Anglo-Indian Workshop Language Engineering for South-Asian languages (2001)
Google Scholar
Véronis, J.: Parallel Text Processing - Alignment and Use of Translation Corpora. Kluwer Academic Publishers, The Netherlands (2000)
MATH Google Scholar
Grefenstette, G.: The WWW as a Resource for Example-Based MT Tasks. In: Proc. ASLIB Translating and the Computer 21 Conference (1999)
Google Scholar
Cao, Y., Li, H.: Base Noun Phrase Translation Using Web Data and the EM Algorithm. In: Proc. 19th Int’l Conf. Computational Linguistics, pp. 127–133 (2002)
Google Scholar
Li, H., Cao, Y., Li, C.: Using Bilingual Web Data to Mine and Rank Translations. IEEE Intelligent Systems 4, 54–59 (2003)
Google Scholar
Navigli, R., Velardi, P., Gangemi, A.: Ontology Learning and Its Application to Automated Terminology Translation. IEEE Intelligent Systems 1, 22–31 (2003)
Google Scholar
Nagata, M., Saito, T., Suzuki, K.: Using the Web as a Bilingual Dictionary. In: Proc. ACL 2001 Workshop Data-Driven Methods in Machine Translation, pp. 95–102 (2001)
Google Scholar
Rapp, R.: Identifying Word Translations in Nonparallel Texts. In: Proc. 33th Annual Meeting of the Association for Computational Linguistics, pp. 320–322 (1995)
Google Scholar
Tanaka, K., Iwasaki, H.: Extraction of Lexical Translation from Non-Aligned Corpora. In: Proc. 16th Int’l Conf. Computational Linguistics, pp. 580–585 (1996)
Google Scholar
Rapp, R.: Automatic Identification of Word Translations from Unrelated English and German Corpora. In: Proc. 37th Annual Meeting Assoc. Computational Linguistics, pp. 519–526 (1999)
Google Scholar
Fung, P.: Compiling Bilingual Lexicon Entries from a Non-Parallel English-Chinese Corpus. In: Proc. Third Annual Workshop on Very Large Corpora, pp. 173–183 (1995)
Google Scholar
Fung, P.: Finding Terminology Translations from Nonparallel Corpora. In: Proc. Fifth Annual Workshop on Very Large Corpora (WVLC 1997), pp. 192–202 (1997)
Google Scholar
Fung, P., Yee, L.P.: An IR Approach for Translation New Words from Nonparallel, Comparable Texts. In: Proc. 17th Int’l Conf. Computational Linguistics and 36th Annual Meeting of the Association for Computational Linguistics, pp. 414–420 (1998)
Google Scholar
Shahzad, I., Ohtake, K., Masuyama, S., Yamamoto, K.: Identifying Translations of Compound Nouns Using Non-Aligned Corpora. In: Proc. Workshop on Multilingual Information Processing and Asian Language Processing, pp. 108–113 (1999)
Google Scholar

Download references

Author information

Authors and Affiliations

Fujitsu Research and Development Center, Co., LTD., Beijing, 100016, China
Gaolin Fang, Hao Yu & Fumihito Nishino

Authors

Gaolin Fang
View author publications
You can also search for this author in PubMed Google Scholar
Hao Yu
View author publications
You can also search for this author in PubMed Google Scholar
Fumihito Nishino
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Center for Language Technology, Macquarie University, 2019, Sydney, NSW, Australia
Robert Dale
Department of Systems Engineering and Engineering Management, The Chinese University of Hong Kong, Shatin, N.T., Hong Kong
Kam-Fai Wong
Institute for Infocomm Research, 21, Heng Mui Keng Terrace, 119613, Singapore
Jian Su
Language Information Sciences Research Centre, City University of Hong Kong, Tat Chee Avenue, Kowloon, Hong Kong
Oi Yee Kwong

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Fang, G., Yu, H., Nishino, F. (2005). Web-Based Terminology Translation Mining. In: Dale, R., Wong, KF., Su, J., Kwong, O.Y. (eds) Natural Language Processing – IJCNLP 2005. IJCNLP 2005. Lecture Notes in Computer Science(), vol 3651. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11562214_87

Download citation

DOI: https://doi.org/10.1007/11562214_87
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-29172-5
Online ISBN: 978-3-540-31724-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics