A Seed Based Method for Dictionary Translation

Krajewski, Robert; Rybiński, Henryk; Kozłowski, Marek

doi:10.1007/978-3-319-08326-1_42

Robert Krajewski²²,
Henryk Rybiński²² &
Marek Kozłowski²²

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 8502))

Included in the following conference series:

International Symposium on Methodologies for Intelligent Systems

1552 Accesses
2 Citations

Abstract

The paper refers to the topic of automatic machine translation. The proposed method enables translating a dictionary by means of mining repositories in the source and target repository, without any directly given relationships connecting two languages. It consists of two stages: (1) translation by lexical similarity, where words are compared graphically, and (2) translation by semantic similarity, where contexts are compared. Polish and English version of Wikipedia were used as multilingual corpora. The method and its stages are thoroughly analyzed. The results allow implementing this method in human-in-the-middle systems.

This work was supported by the National Centre for Research and Development (NCBiR) under Grant No. SP/I/1/77065/10 devoted to the Strategic scientific research and experimental development program: ”Interdisciplinary System for Interactive Scientific and Scientific-Technical Information”.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Pimienta, D., Prado, D., Blanco, A.: Twelve years of measuring linguistic diversity in the Internet: Balance and perspectives. United Nations Educational, Scientific and Cultural Organization (2009)
Google Scholar
Salton, G.: Automatic processing of foreign language documents. Journal of the American Society for Information Science 21(3) (1970)
Google Scholar
Hull, D., Grefenstette, G.: Querying across languages: a dictionary-based approach to multilingual information retrieval. In: Proceedings of the 19th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM (1996)
Google Scholar
Ballesteros, L., Croft, W.: QPhrasal translation and query expansion techniques for crosslanguage information retrieval. In: ACM SIGIR Forum, vol. 31. ACM (1997)
Google Scholar
Pirkola, A.: The effects of query structure and dictionary setups in dictionary-based cross-language information retrieval. In: Proceedings of the 21st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM (1998)
Google Scholar
Sorg, P., Cimiano, P.: Cross-lingual information retrieval with explicit semantic analysis. In: Working Notes for the CLEF Workshop (2008)
Google Scholar
Sorg, P., Cimiano, P.: Enriching the crosslingual link structure of wikipedia classification-based approach. In: Proceedings of the AAAI 2008 Workshop on Wikipedia and Artifical Intelligence (2008)
Google Scholar
Soergel, D.: Multilingual thesauri in cross-language text and speech retrieval. In: AAAI Symposium on Cross-Language Text and Speech Retrieval (1997)
Google Scholar
Brown, P., et al.: A statistical approach to machine translation. Computational linguistics 16(2) (1990)
Google Scholar
Koehn, P., Och, F., Marcu, D.: Statistical phrase-based translation. In: Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology, vol. 1. Association for Computational Linguistics (2003)
Google Scholar
Koehn, P., et al.: Moses: Open source toolkit for statistical machine translation. In: Proceedings of the Annual Meeting of the Association for Computational Linguistics (2007)
Google Scholar
Deng, Y., Byrne, W.: Hmm word and phrase alignment for statistical machine translation. IEEE Transactions Audio, Speech, and Language Processing (2008)
Google Scholar
Dumais, S., Letsche, T., Littman, M., Landauer, T.: Automatic cross-language retrieval using latent semantic indexing. In: AAAI Spring Symposium on Cross-Language Text and Speech Retrieval (1997)
Google Scholar
Gabrilovich, E., Markovitch, S.: Computing semantic relatedness using wikipedia-based explicit semantic analysis. In: IJCAI (2007)
Google Scholar
Navigli, R., Ponzetto, S.: BabelNet: Building a very large multilingual semantic network. In: 48th Annual Meeting of the Association for Computational Linguistics, Uppsala, Sweden (2010)
Google Scholar
McCrae, J., Espinoza, M., Montiel-Ponsoda, E., Aguado de Cea, G., Cimiano, P.: Combining statistical and semantic approaches to the translation of ontologies and taxonomies. In: Proceedings of the Fifth Workshop on Syntax, Structure and Semantics in Statistical Translation, Uppsala, Sweden (2010)
Google Scholar
Rapp, R.: Identifying word translations in non-parallel texts. In: Proceedings of the 33rd Annual Meeting on Association for Computational Linguistics. Association for Computational Linguistics (1995)
Google Scholar
Rapp, R.: Automatic identification of word translations from unrelated english and german corpora. In: Proceedings of the 37th Annual Meeting of the Association for Computationald Linguistics on Computational Linguistics. Association for Computational Linguistics (1999)
Google Scholar
Koehn, P., Knight, K.: Learning a translation lexicon from monolingual corpora. In: Proceedings of the ACL 2002 Workshop on Unsupervised Lexical Acquisition, vol. 9. Association for Computational Linguistics (2002)
Google Scholar
Rybiński, H., Kryszkiewicz, M., Protaziuk, G., Kontkiewicz, A., Marcinkowska, K., Delteil, A.: Discovering Word Meanings Based on Frequent Termsets. In: Raś, Z.W., Tsumoto, S., Zighed, D.A. (eds.) MCD 2007. LNCS (LNAI), vol. 4944, pp. 82–92. Springer, Heidelberg (2008)
Chapter Google Scholar
Kozlowski, M.: Word sense discovery using frequent termsets. PhD Thesis, Warsaw University of Technology (2014)
Google Scholar

Download references

Author information

Authors and Affiliations

Warsaw University of Technology, Warsaw, Poland
Robert Krajewski, Henryk Rybiński & Marek Kozłowski

Authors

Robert Krajewski
View author publications
You can also search for this author in PubMed Google Scholar
Henryk Rybiński
View author publications
You can also search for this author in PubMed Google Scholar
Marek Kozłowski
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Research Group PLIS: Programming, Logic and Intelligent Systems Dept. of Communication, Business and Information Technologies, Roskilde University, Denmark
Troels Andreasen & Henning Christiansen &
Department of Computer Science and Artificial Intelligence, CITIC, University of Granada, 18071, Granada, Spain
Juan-Carlos Cubero
University of North Carolina, , , 9201 University City Blvd, Charlotte, NC 28223 USA, and Warsaw University of Technology, ul. Nowowiejska 15/19, 00-665 Warsaw, Poland
Zbigniew W. Raś

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Krajewski, R., Rybiński, H., Kozłowski, M. (2014). A Seed Based Method for Dictionary Translation. In: Andreasen, T., Christiansen, H., Cubero, JC., Raś, Z.W. (eds) Foundations of Intelligent Systems. ISMIS 2014. Lecture Notes in Computer Science(), vol 8502. Springer, Cham. https://doi.org/10.1007/978-3-319-08326-1_42

Download citation

DOI: https://doi.org/10.1007/978-3-319-08326-1_42
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-08325-4
Online ISBN: 978-3-319-08326-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics