Abstract
The paper refers to the topic of automatic machine translation. The proposed method enables translating a dictionary by means of mining repositories in the source and target repository, without any directly given relationships connecting two languages. It consists of two stages: (1) translation by lexical similarity, where words are compared graphically, and (2) translation by semantic similarity, where contexts are compared. Polish and English version of Wikipedia were used as multilingual corpora. The method and its stages are thoroughly analyzed. The results allow implementing this method in human-in-the-middle systems.
This work was supported by the National Centre for Research and Development (NCBiR) under Grant No. SP/I/1/77065/10 devoted to the Strategic scientific research and experimental development program: ”Interdisciplinary System for Interactive Scientific and Scientific-Technical Information”.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Pimienta, D., Prado, D., Blanco, A.: Twelve years of measuring linguistic diversity in the Internet: Balance and perspectives. United Nations Educational, Scientific and Cultural Organization (2009)
Salton, G.: Automatic processing of foreign language documents. Journal of the American Society for Information Science 21(3) (1970)
Hull, D., Grefenstette, G.: Querying across languages: a dictionary-based approach to multilingual information retrieval. In: Proceedings of the 19th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM (1996)
Ballesteros, L., Croft, W.: QPhrasal translation and query expansion techniques for crosslanguage information retrieval. In: ACM SIGIR Forum, vol. 31. ACM (1997)
Pirkola, A.: The effects of query structure and dictionary setups in dictionary-based cross-language information retrieval. In: Proceedings of the 21st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM (1998)
Sorg, P., Cimiano, P.: Cross-lingual information retrieval with explicit semantic analysis. In: Working Notes for the CLEF Workshop (2008)
Sorg, P., Cimiano, P.: Enriching the crosslingual link structure of wikipedia classification-based approach. In: Proceedings of the AAAI 2008 Workshop on Wikipedia and Artifical Intelligence (2008)
Soergel, D.: Multilingual thesauri in cross-language text and speech retrieval. In: AAAI Symposium on Cross-Language Text and Speech Retrieval (1997)
Brown, P., et al.: A statistical approach to machine translation. Computational linguistics 16(2) (1990)
Koehn, P., Och, F., Marcu, D.: Statistical phrase-based translation. In: Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology, vol. 1. Association for Computational Linguistics (2003)
Koehn, P., et al.: Moses: Open source toolkit for statistical machine translation. In: Proceedings of the Annual Meeting of the Association for Computational Linguistics (2007)
Deng, Y., Byrne, W.: Hmm word and phrase alignment for statistical machine translation. IEEE Transactions Audio, Speech, and Language Processing (2008)
Dumais, S., Letsche, T., Littman, M., Landauer, T.: Automatic cross-language retrieval using latent semantic indexing. In: AAAI Spring Symposium on Cross-Language Text and Speech Retrieval (1997)
Gabrilovich, E., Markovitch, S.: Computing semantic relatedness using wikipedia-based explicit semantic analysis. In: IJCAI (2007)
Navigli, R., Ponzetto, S.: BabelNet: Building a very large multilingual semantic network. In: 48th Annual Meeting of the Association for Computational Linguistics, Uppsala, Sweden (2010)
McCrae, J., Espinoza, M., Montiel-Ponsoda, E., Aguado de Cea, G., Cimiano, P.: Combining statistical and semantic approaches to the translation of ontologies and taxonomies. In: Proceedings of the Fifth Workshop on Syntax, Structure and Semantics in Statistical Translation, Uppsala, Sweden (2010)
Rapp, R.: Identifying word translations in non-parallel texts. In: Proceedings of the 33rd Annual Meeting on Association for Computational Linguistics. Association for Computational Linguistics (1995)
Rapp, R.: Automatic identification of word translations from unrelated english and german corpora. In: Proceedings of the 37th Annual Meeting of the Association for Computationald Linguistics on Computational Linguistics. Association for Computational Linguistics (1999)
Koehn, P., Knight, K.: Learning a translation lexicon from monolingual corpora. In: Proceedings of the ACL 2002 Workshop on Unsupervised Lexical Acquisition, vol. 9. Association for Computational Linguistics (2002)
Rybiński, H., Kryszkiewicz, M., Protaziuk, G., Kontkiewicz, A., Marcinkowska, K., Delteil, A.: Discovering Word Meanings Based on Frequent Termsets. In: Raś, Z.W., Tsumoto, S., Zighed, D.A. (eds.) MCD 2007. LNCS (LNAI), vol. 4944, pp. 82–92. Springer, Heidelberg (2008)
Kozlowski, M.: Word sense discovery using frequent termsets. PhD Thesis, Warsaw University of Technology (2014)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2014 Springer International Publishing Switzerland
About this paper
Cite this paper
Krajewski, R., Rybiński, H., Kozłowski, M. (2014). A Seed Based Method for Dictionary Translation. In: Andreasen, T., Christiansen, H., Cubero, JC., Raś, Z.W. (eds) Foundations of Intelligent Systems. ISMIS 2014. Lecture Notes in Computer Science(), vol 8502. Springer, Cham. https://doi.org/10.1007/978-3-319-08326-1_42
Download citation
DOI: https://doi.org/10.1007/978-3-319-08326-1_42
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-08325-4
Online ISBN: 978-3-319-08326-1
eBook Packages: Computer ScienceComputer Science (R0)