The Use of a Structural N-gram Language Model in Generation-Heavy Hybrid Machine Translation

Habash, Nizar

doi:10.1007/978-3-540-27823-8_7

Nizar Habash²¹

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 3123))

Included in the following conference series:

International Conference on Natural Language Generation

569 Accesses

Abstract

This paper describes the use of a statistical structural N-gram model in the natural language generation component of a Spanish-English generation-heavy hybrid machine translation system. A structural N-gram model captures the relationship between words in a dependency representation without taking into account the overall structure at the phrase level. The model is used together with other components in the system for lexical and structural selection. An evaluation of the machine translation system shows that the use of structural N-grams decreases runtime by 60% with no loss in translation quality.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Morphology generation for English-Indian language statistical machine translation

Article 19 November 2020

Genetic-Based Decoder for Statistical Machine Translation

Recent advances in Apertium, a free/open-source rule-based machine translation platform for low-resource languages

Article Open access 18 October 2021

References

Knight, K., Hatzivassiloglou, V.: Two-Level, Many-Paths Generation. In: Proceedings of the 33rd Annual Meeting of the Association for Computational Linguistics (ACL 1995), Cambridge, MA, pp. 252–260 (1995)
Google Scholar
Brown, R., Frederking, R.: Applying Statistical English Language Modeling to Symbolic Machine Translation. In: Proceedings of the Sixth International Conference on Theoretical and Methodological Issues in Machine Translation, Leuven, Belgium, pp. 221–239 (1995)
Google Scholar
Langkilde, I., Knight, K.: Generating Word Lattices from Abstract Meaning Representation. Technical report, Information Science Institute, University of Southern California (1998)
Google Scholar
Bangalore, S., Rambow, O.: Corpus-Based Lexical Choice in Natural Language Generation. In: Proceedings of the 38th Annual Meeting of the Association for Computational Linguistics (ACL 2000), Hongkong, China (2000)
Google Scholar
Habash, N., Dorr, B., Traum, D.: Hybrid Natural Language Generation from Lexical Conceptual Structures. Machine Translation 17 (2003)
Google Scholar
Collins, M.: Three Generative, LexicalisedModels for Statistical Parsing. In: Proceedings of the 35th Annual Meeting of the ACL (jointly with the 8th Conference of the EACL), Madrid, Spain (1997)
Google Scholar
Charniak, E.: Statistical parsing with a context-free grammar and word statistics. In: Proceedings of the AAAI, Providence, RI, AAAI Press/MIT Press (1997)
Google Scholar
Charniak, E.: Immediate-head parsing for language models. In: Proceedings of the 39th Annual Meeting of the Association for Computational Linguistics (2001)
Google Scholar
Sima’an, K.: Tree-gram parsing: Lexical dependencies and structural relations. In: Proceedings of 38th Annual Meeting of the Association for Computational Linguistics (ACL 2000), Hong Kong, China (2000)
Google Scholar
Ratnaparkhi, A.: Trainable Methods for Surface Natural Language Generation. In: Proceedings of the 1st Annual North American Association of Computational Linguistics, NAACL 2000, Seattle,WA, pp. 194–201 (2000)
Google Scholar
Charniak, E.: A maximum-entropy-inspired parser. In: Proceedings of the First Meeting of the North American Chapter of the Association for Computational Linguistics NAACL-2000, Seattle,Washington (2000)
Google Scholar
Daumé, H., Knight, K., Langkilde-Geary, I., Marku, D., Yamada, K.: The importance of lexicalized syntax models for natural language generation tasks. In: Proceedings of the International Natural Language Generation Conference (INLG 2002), New York (2002)
Google Scholar
Langkilde, I.: Forest-based statistical sentence generation. In: Association for Computational Linguistics conference, North American chapter, NAACL 2000 (2000)
Google Scholar
Fellbaum, C.: WordNet: An Electronic Lexical Database. MIT Press, Cambridge (1998)
MATH Google Scholar
Habash, N.: Matador: A Large-Scale Spanish-English GHMT System. In: Proceedings of the Ninth Machine Translation Summit (MT SUMMIT IX), New Orleans, USA (2003)
Google Scholar
Habash, N.: Generation-Heavy Machine Translation. In: Proceedings of the International Natural Language Generation Conference (INLG 2002), Student Session, New York (2002)
Google Scholar
Habash, N.: Oxygen: A Language Independent Linearization Engine. In: White, J.S. (ed.) AMTA 2000. LNCS (LNAI), vol. 1934, pp. 68–79. Springer, Heidelberg (2000)
Chapter Google Scholar
Johnson, M.: Joint and Conditional Estimation of Tagging and Parsing Models. In: Proceedings of the 39th Annual Meeting of the Association for Computational Linguistics (ACL 2001), Toulouse, France (2001)
Google Scholar
Aymerich, J.: Generation of Noun-Noun Compounds in the Spanish-English Machine Translation System SPANAM. In: Proceedings of the Eighth Machine Translation Summit (MT SUMMIT VIII), Santiago de Compostela, Spain (2001)
Google Scholar
Tanaka, T., Baldwin, T.: Translation Selection for Japanese-English Noun-Noun Compounds. In: Proceedings of the Ninth Machine Translation Summit (MT SUMMIT IX), New Orleans, USA (2003)
Google Scholar
Papineni, K., Roukos, S., Ward, T., Zhu, W.: Bleu: a Method for Automatic Evaluation of Machine Translation. Technical Report RC22176(W0109-022), IBM Research Division, Yorktown Heights, NY (2001)
Google Scholar
Bangalore, S., Rambow, O., Whittaker, S.: Evaluation Metrics for Generation. In: Proceedings of the 1st International Conference on Natural Language Generation (INLG 2000), Mitzpe Ramon, Israel (2000)
Google Scholar
Hovy, E.: MT Evaluation Bibliography. In: The ISLE Classification of Machine Translation Evaluations International Standards for Language Engineering (ISLE), Information Sciences Institute, Los Angeles (2000), http://www.isi.edu/natural-language/mteval/2e MT-bibliography.htm
Google Scholar
Graff, D.: UN Parallel Text (Spanish-English), LDC Catalog No.: LDC94T4A Linguistic Data Consortium, University of Pennsylvania (1994)
Google Scholar
Tapanainen, P., Jarvinen, T.: A non-projective dependency parser. In: 5th Conference on Applied Natural Language Processing / Association for Computational Linguistics, Washington, D.C. (1997)
Google Scholar
Jinxi, X.: UN Parallel Text (Arabic-English), LDC Catalog No.: LDC2002E15 Linguistic Data Consortium, University of Pennsylvania (2002)
Google Scholar

Download references

Author information

Authors and Affiliations

University Institute for Advanced Computer Studies, University of Maryland, College Park
Nizar Habash

Authors

Nizar Habash
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Information Technology Research Institute, University of Brighton, Lewes Road, BN2 4GJ, Brighton, UK
Anja Belz
University of Brighton, Brighton, UK
Roger Evans
NLG Group, Centre for Research in Computing, The Open University, Walton Hall, MK7 6AA, Milton Keynes, UK
Paul Piwek

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Habash, N. (2004). The Use of a Structural N-gram Language Model in Generation-Heavy Hybrid Machine Translation. In: Belz, A., Evans, R., Piwek, P. (eds) Natural Language Generation. INLG 2004. Lecture Notes in Computer Science(), vol 3123. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-27823-8_7

Download citation

DOI: https://doi.org/10.1007/978-3-540-27823-8_7
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-22340-5
Online ISBN: 978-3-540-27823-8
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics

The Use of a Structural N-gram Language Model in Generation-Heavy Hybrid Machine Translation

Abstract

Access this chapter

Preview

Similar content being viewed by others

Morphology generation for English-Indian language statistical machine translation

Genetic-Based Decoder for Statistical Machine Translation

Recent advances in Apertium, a free/open-source rule-based machine translation platform for low-resource languages

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

The Use of a Structural N-gram Language Model in Generation-Heavy Hybrid Machine Translation

Abstract

Access this chapter

Preview

Similar content being viewed by others

Morphology generation for English-Indian language statistical machine translation

Genetic-Based Decoder for Statistical Machine Translation

Recent advances in Apertium, a free/open-source rule-based machine translation platform for low-resource languages

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation