Abstract
We present an approach for machine translation by applying the GenPar toolkit on POS-tagged and syntactically parsed texts. Our experiment in Hungarian-English machine translation is an attempt to develop prototypes of a syntax-driven machine translation system and to examine the effects of various preprocessing steps (POS-tagging, lemmatization and syntactic parsing) on system performance. The annotated monolingual texts needed for different language specific tasks were taken from the Szeged Treebank and the Penn Treebank. The parallel sentences were collected from the Hunglish Corpus. Each developed prototype runs fully automatically and new Hungarian-related functions are built in. The results are evaluated with BLEU score.
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Bikel, D.: A distributional analysis of a lexicalized statistical parsing model. In: Proceedings of the 9th Conference on Empirical Methods in Natural Language Processing (EMNLP), Barcelona, Spain (2004)
Brown, Peter, F., Della Pietra, S.A., Pietra, V.D.J., Mercer, R.L.: The mathematics of statistical machine translation: Parameter estimation. Computational Linguistics 19(2), 263–312 (1993)
Burbank, A., Carpuat, M., Clark, S., Dreyer, M., Fox, P., Groves, D., Hall, K., Hearne, M., Melamed, I.D., Shen, Y., Way, A., Wellington, B., Wu, D.: Final Report of the 2005 Language Engineering Workshop on Statistical Machine Translation by Parsing (November 2005)
Csendes, D., Csirik, J., Gyimóthy, T., Kocsor, A.: The Szeged Treebank. In: Matoušek, V., Mautner, P., Pavelka, T. (eds.) TSD 2005. LNCS (LNAI), vol. 3658, pp. 123–131. Springer, Heidelberg (2005)
Erjavec, T., Monachini, M. (eds.): Specification and Notation for Lexicon Encoding. Copernicus project 106 MULTEXT-EAST, Work Package WP1 - Task 1.1 Deliverable D1.1F (1997)
Hócza, A., Felföldi, L., Kocsor, A.: Learning Syntactic Patterns Using Boosting and Other Classifier Combination Schemas. In: Matoušek, V., Mautner, P., Pavelka, T. (eds.) TSD 2005. LNCS (LNAI), vol. 3658, pp. 69–76. Springer, Heidelberg (2005)
Kumar, S., Byrne, W.: A weighted finite-state transducer implementation of the alignment template model for statistical machine translation. In: Proceedings of the Human Language Technology Conference and the North American Association for Computational Linguistics (HLT-NAACL), Edmonton, Canada, pp. 63–70 (2003)
Lin, C.-Y., Och, F.J.: Automatic evaluation of machine translation quality using longest common subsequence and skip-bigram statistics. In: Proceedings of the 42nd Annual Meeting of the ACL, pp. 606–613 (2004)
Marcus, M., Santorini, B., Marcinkiewicz, M.: Building a large annotated corpus of English: the Penn Treebank. Computational Linguistics 19 (1993)
Melamed, I.D., Wei, W.: Statistical Machine Translation by Generalized Parsing. Technical Report 05-001, Proteus Project, New York University (2005)
Och, J.F., Ney, H.: Discriminative training and maximum entropy models for statistical machine translation. In: Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics (ACL), Philadelphia (2002)
Papineni, K., Roukos, S., Ward, T., Zhu, W.J.: BLEU: a method for automatic evaluation of machine translation. In: Proceedings of the 40th Annual Meeting of the ACL, pp. 311–318 (2002)
Prószéky, G., Tihanyi, L.: MetaMorpho: A Pattern-Based Machine Translation Project. In: 24th Translating and the Computer Conference, London, United Kingdom, pp. 19–24 (2002)
Ratnaparkhi, A.: A linear observed time statistical parser based on maximum entropy models. In: Proceedings of the 2nd Conference on Empirical Methods in Natural Language Processing (EMNLP), Providence, Rhode Island (1997)
Tihanyi, L., Csendes, D., Merényi, C., Gyarmati, Á.: Technical report of NKFP-2/008/2004 (2005)
Varga, D., Németh, L., Halácsy, P., Kornai, A., Trón, V., Nagy, V.: Parallel corpora for medium density languages. In: Proceedings of the Recent Advances in Natural Language Processing 2005 Conference, pp. 590–596 (2005)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2006 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Hócza, A., Kocsor, A. (2006). Hungarian-English Machine Translation Using GenPar. In: Sojka, P., Kopeček, I., Pala, K. (eds) Text, Speech and Dialogue. TSD 2006. Lecture Notes in Computer Science(), vol 4188. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11846406_11
Download citation
DOI: https://doi.org/10.1007/11846406_11
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-39090-9
Online ISBN: 978-3-540-39091-6
eBook Packages: Computer ScienceComputer Science (R0)