Abstract
This paper describes the use of a statistical structural N-gram model in the natural language generation component of a Spanish-English generation-heavy hybrid machine translation system. A structural N-gram model captures the relationship between words in a dependency representation without taking into account the overall structure at the phrase level. The model is used together with other components in the system for lexical and structural selection. An evaluation of the machine translation system shows that the use of structural N-grams decreases runtime by 60% with no loss in translation quality.
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Knight, K., Hatzivassiloglou, V.: Two-Level, Many-Paths Generation. In: Proceedings of the 33rd Annual Meeting of the Association for Computational Linguistics (ACL 1995), Cambridge, MA, pp. 252–260 (1995)
Brown, R., Frederking, R.: Applying Statistical English Language Modeling to Symbolic Machine Translation. In: Proceedings of the Sixth International Conference on Theoretical and Methodological Issues in Machine Translation, Leuven, Belgium, pp. 221–239 (1995)
Langkilde, I., Knight, K.: Generating Word Lattices from Abstract Meaning Representation. Technical report, Information Science Institute, University of Southern California (1998)
Bangalore, S., Rambow, O.: Corpus-Based Lexical Choice in Natural Language Generation. In: Proceedings of the 38th Annual Meeting of the Association for Computational Linguistics (ACL 2000), Hongkong, China (2000)
Habash, N., Dorr, B., Traum, D.: Hybrid Natural Language Generation from Lexical Conceptual Structures. Machine Translation 17 (2003)
Collins, M.: Three Generative, LexicalisedModels for Statistical Parsing. In: Proceedings of the 35th Annual Meeting of the ACL (jointly with the 8th Conference of the EACL), Madrid, Spain (1997)
Charniak, E.: Statistical parsing with a context-free grammar and word statistics. In: Proceedings of the AAAI, Providence, RI, AAAI Press/MIT Press (1997)
Charniak, E.: Immediate-head parsing for language models. In: Proceedings of the 39th Annual Meeting of the Association for Computational Linguistics (2001)
Sima’an, K.: Tree-gram parsing: Lexical dependencies and structural relations. In: Proceedings of 38th Annual Meeting of the Association for Computational Linguistics (ACL 2000), Hong Kong, China (2000)
Ratnaparkhi, A.: Trainable Methods for Surface Natural Language Generation. In: Proceedings of the 1st Annual North American Association of Computational Linguistics, NAACL 2000, Seattle,WA, pp. 194–201 (2000)
Charniak, E.: A maximum-entropy-inspired parser. In: Proceedings of the First Meeting of the North American Chapter of the Association for Computational Linguistics NAACL-2000, Seattle,Washington (2000)
Daumé, H., Knight, K., Langkilde-Geary, I., Marku, D., Yamada, K.: The importance of lexicalized syntax models for natural language generation tasks. In: Proceedings of the International Natural Language Generation Conference (INLG 2002), New York (2002)
Langkilde, I.: Forest-based statistical sentence generation. In: Association for Computational Linguistics conference, North American chapter, NAACL 2000 (2000)
Fellbaum, C.: WordNet: An Electronic Lexical Database. MIT Press, Cambridge (1998)
Habash, N.: Matador: A Large-Scale Spanish-English GHMT System. In: Proceedings of the Ninth Machine Translation Summit (MT SUMMIT IX), New Orleans, USA (2003)
Habash, N.: Generation-Heavy Machine Translation. In: Proceedings of the International Natural Language Generation Conference (INLG 2002), Student Session, New York (2002)
Habash, N.: Oxygen: A Language Independent Linearization Engine. In: White, J.S. (ed.) AMTA 2000. LNCS (LNAI), vol. 1934, pp. 68–79. Springer, Heidelberg (2000)
Johnson, M.: Joint and Conditional Estimation of Tagging and Parsing Models. In: Proceedings of the 39th Annual Meeting of the Association for Computational Linguistics (ACL 2001), Toulouse, France (2001)
Aymerich, J.: Generation of Noun-Noun Compounds in the Spanish-English Machine Translation System SPANAM. In: Proceedings of the Eighth Machine Translation Summit (MT SUMMIT VIII), Santiago de Compostela, Spain (2001)
Tanaka, T., Baldwin, T.: Translation Selection for Japanese-English Noun-Noun Compounds. In: Proceedings of the Ninth Machine Translation Summit (MT SUMMIT IX), New Orleans, USA (2003)
Papineni, K., Roukos, S., Ward, T., Zhu, W.: Bleu: a Method for Automatic Evaluation of Machine Translation. Technical Report RC22176(W0109-022), IBM Research Division, Yorktown Heights, NY (2001)
Bangalore, S., Rambow, O., Whittaker, S.: Evaluation Metrics for Generation. In: Proceedings of the 1st International Conference on Natural Language Generation (INLG 2000), Mitzpe Ramon, Israel (2000)
Hovy, E.: MT Evaluation Bibliography. In: The ISLE Classification of Machine Translation Evaluations International Standards for Language Engineering (ISLE), Information Sciences Institute, Los Angeles (2000), http://www.isi.edu/natural-language/mteval/2e MT-bibliography.htm
Graff, D.: UN Parallel Text (Spanish-English), LDC Catalog No.: LDC94T4A Linguistic Data Consortium, University of Pennsylvania (1994)
Tapanainen, P., Jarvinen, T.: A non-projective dependency parser. In: 5th Conference on Applied Natural Language Processing / Association for Computational Linguistics, Washington, D.C. (1997)
Jinxi, X.: UN Parallel Text (Arabic-English), LDC Catalog No.: LDC2002E15 Linguistic Data Consortium, University of Pennsylvania (2002)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2004 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Habash, N. (2004). The Use of a Structural N-gram Language Model in Generation-Heavy Hybrid Machine Translation. In: Belz, A., Evans, R., Piwek, P. (eds) Natural Language Generation. INLG 2004. Lecture Notes in Computer Science(), vol 3123. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-27823-8_7
Download citation
DOI: https://doi.org/10.1007/978-3-540-27823-8_7
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-22340-5
Online ISBN: 978-3-540-27823-8
eBook Packages: Springer Book Archive