Abstract
This paper describes an algorithm that assists in the discovery of Named Entity (NE) translation pairs from large corpora. It is based on Latent Semantic Analysis (LSA) and Cross-Lingual Latent Semantic Indexing (CL-LSI), and is demonstrated to be able to automatically discover new translation pairs in a bootstrapping framework. Some experiments are performed to quantify the interaction between corpus size, features and algorithm parameters, in order to better understand the workings of the proposed approach.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Al-Onaizan, Y., Knight, K.: Machine transliteration of names in Arabic text. In: Proc. of ACL Workshop on Computational Approaches to Semitic Languages, pp. 400–408 (2002)
Oh, J.H., Choi, K.S.: An ensemble of grapheme and phoneme for machine transliteration. In: Dale, R., Wong, K.-F., Su, J., Kwong, O.Y. (eds.) IJCNLP 2005. LNCS (LNAI), vol. 3651, pp. 450–461. Springer, Heidelberg (2005)
Li, H., Zhang, M., Su, J.: A joint source-channel model for machine transliteration. Association for Computational Linguistics (2004)
Huang, F., Vogel, S., Waibel, A.: Improving named entity translation combining phonetic and semantic similarities. In: HLT/NAACL (2004)
Utsuro, T.: Translation knowledge acquisition from cross-linguistically relevant news articles (2004)
Cancedda, N., Dejean, H., Gaussier, E., Renders, J.M.: Report on CLEF-2003 experiments: two ways of extracting multilingual resources (2003)
Landauer, T.K., Littman, M.L.: A statistical method for language-independent representation of the topical context of text segments. In: Proceedings of the Sixth Annual Conference of the UW Centre for the New Oxford English Dictionary and Text Research, pp. 31–38 (1990)
Dumais, S., Letsche, T., Littman, M., Landauer, T.: Automatic cross-language retrieval using latent semantic indexing. American Association for Artificial Intelligence (1997)
Mori, T., Kokubu, T., Tanaka, T.: Cross-lingual information retrieval based on LSI with multiple word spaces. In: Proceedings of the NTCIR Workshop 2 Meeting, pp. 67–74 (2001)
Kim, Y.-S., Chang, J.-H., Zhang, B.-T.: A comparative evaluation of data-driven models in translation selection of machine transliteration. In: COLING (2002)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2006 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Lim, B.P., Sproat, R.W. (2006). Using Latent Semantics for NE Translation. In: Matsumoto, Y., Sproat, R.W., Wong, KF., Zhang, M. (eds) Computer Processing of Oriental Languages. Beyond the Orient: The Research Challenges Ahead. ICCPOL 2006. Lecture Notes in Computer Science(), vol 4285. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11940098_48
Download citation
DOI: https://doi.org/10.1007/11940098_48
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-49667-0
Online ISBN: 978-3-540-49668-7
eBook Packages: Computer ScienceComputer Science (R0)