Using Latent Semantics for NE Translation

Lim, Boon Pang; Sproat, Richard W.

doi:10.1007/11940098_48

Using Latent Semantics for NE Translation

Boon Pang Lim²² &
Richard W. Sproat²²

Conference paper

995 Accesses
1 Citations

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 4285))

Abstract

This paper describes an algorithm that assists in the discovery of Named Entity (NE) translation pairs from large corpora. It is based on Latent Semantic Analysis (LSA) and Cross-Lingual Latent Semantic Indexing (CL-LSI), and is demonstrated to be able to automatically discover new translation pairs in a bootstrapping framework. Some experiments are performed to quantify the interaction between corpus size, features and algorithm parameters, in order to better understand the workings of the proposed approach.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 99.00; Price excludes VAT (USA)

Softcover Book: USD 129.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Al-Onaizan, Y., Knight, K.: Machine transliteration of names in Arabic text. In: Proc. of ACL Workshop on Computational Approaches to Semitic Languages, pp. 400–408 (2002)
Google Scholar
Oh, J.H., Choi, K.S.: An ensemble of grapheme and phoneme for machine transliteration. In: Dale, R., Wong, K.-F., Su, J., Kwong, O.Y. (eds.) IJCNLP 2005. LNCS (LNAI), vol. 3651, pp. 450–461. Springer, Heidelberg (2005)
Chapter Google Scholar
Li, H., Zhang, M., Su, J.: A joint source-channel model for machine transliteration. Association for Computational Linguistics (2004)
Google Scholar
Huang, F., Vogel, S., Waibel, A.: Improving named entity translation combining phonetic and semantic similarities. In: HLT/NAACL (2004)
Google Scholar
Utsuro, T.: Translation knowledge acquisition from cross-linguistically relevant news articles (2004)
Google Scholar
Cancedda, N., Dejean, H., Gaussier, E., Renders, J.M.: Report on CLEF-2003 experiments: two ways of extracting multilingual resources (2003)
Google Scholar
Landauer, T.K., Littman, M.L.: A statistical method for language-independent representation of the topical context of text segments. In: Proceedings of the Sixth Annual Conference of the UW Centre for the New Oxford English Dictionary and Text Research, pp. 31–38 (1990)
Google Scholar
Dumais, S., Letsche, T., Littman, M., Landauer, T.: Automatic cross-language retrieval using latent semantic indexing. American Association for Artificial Intelligence (1997)
Google Scholar
Mori, T., Kokubu, T., Tanaka, T.: Cross-lingual information retrieval based on LSI with multiple word spaces. In: Proceedings of the NTCIR Workshop 2 Meeting, pp. 67–74 (2001)
Google Scholar
Kim, Y.-S., Chang, J.-H., Zhang, B.-T.: A comparative evaluation of data-driven models in translation selection of machine transliteration. In: COLING (2002)
Google Scholar

Download references

Author information

Authors and Affiliations

Dept of ECE, University of Illinois at Urbana Champaign, Urbana, IL, 61801, USA
Boon Pang Lim & Richard W. Sproat

Authors

Boon Pang Lim
View author publications
You can also search for this author in PubMed Google Scholar
Richard W. Sproat
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Graduate School of Information Science, Nara Institute of Science and Technology, 630-0192, Takayama, Ikoma, Nara, Japan
Yuji Matsumoto
Dept of ECE, University of Illinois at Urbana Champaign, IL 61801, Urbana, USA
Richard W. Sproat
Department of Systems Engineering and Engineering Management, The Chinese University of Hong Kong, Shatin, N.T., Hong Kong
Kam-Fai Wong
State Key Lab of Intelligent Tech. & Sys., Tsinghua University,
Min Zhang

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Lim, B.P., Sproat, R.W. (2006). Using Latent Semantics for NE Translation. In: Matsumoto, Y., Sproat, R.W., Wong, KF., Zhang, M. (eds) Computer Processing of Oriental Languages. Beyond the Orient: The Research Challenges Ahead. ICCPOL 2006. Lecture Notes in Computer Science(), vol 4285. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11940098_48

Download citation

DOI: https://doi.org/10.1007/11940098_48
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-49667-0
Online ISBN: 978-3-540-49668-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics