Skip to main content

The Use of a Structural N-gram Language Model in Generation-Heavy Hybrid Machine Translation

  • Conference paper
Natural Language Generation (INLG 2004)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 3123))

Included in the following conference series:

Abstract

This paper describes the use of a statistical structural N-gram model in the natural language generation component of a Spanish-English generation-heavy hybrid machine translation system. A structural N-gram model captures the relationship between words in a dependency representation without taking into account the overall structure at the phrase level. The model is used together with other components in the system for lexical and structural selection. An evaluation of the machine translation system shows that the use of structural N-grams decreases runtime by 60% with no loss in translation quality.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Knight, K., Hatzivassiloglou, V.: Two-Level, Many-Paths Generation. In: Proceedings of the 33rd Annual Meeting of the Association for Computational Linguistics (ACL 1995), Cambridge, MA, pp. 252–260 (1995)

    Google Scholar 

  2. Brown, R., Frederking, R.: Applying Statistical English Language Modeling to Symbolic Machine Translation. In: Proceedings of the Sixth International Conference on Theoretical and Methodological Issues in Machine Translation, Leuven, Belgium, pp. 221–239 (1995)

    Google Scholar 

  3. Langkilde, I., Knight, K.: Generating Word Lattices from Abstract Meaning Representation. Technical report, Information Science Institute, University of Southern California (1998)

    Google Scholar 

  4. Bangalore, S., Rambow, O.: Corpus-Based Lexical Choice in Natural Language Generation. In: Proceedings of the 38th Annual Meeting of the Association for Computational Linguistics (ACL 2000), Hongkong, China (2000)

    Google Scholar 

  5. Habash, N., Dorr, B., Traum, D.: Hybrid Natural Language Generation from Lexical Conceptual Structures. Machine Translation 17 (2003)

    Google Scholar 

  6. Collins, M.: Three Generative, LexicalisedModels for Statistical Parsing. In: Proceedings of the 35th Annual Meeting of the ACL (jointly with the 8th Conference of the EACL), Madrid, Spain (1997)

    Google Scholar 

  7. Charniak, E.: Statistical parsing with a context-free grammar and word statistics. In: Proceedings of the AAAI, Providence, RI, AAAI Press/MIT Press (1997)

    Google Scholar 

  8. Charniak, E.: Immediate-head parsing for language models. In: Proceedings of the 39th Annual Meeting of the Association for Computational Linguistics (2001)

    Google Scholar 

  9. Sima’an, K.: Tree-gram parsing: Lexical dependencies and structural relations. In: Proceedings of 38th Annual Meeting of the Association for Computational Linguistics (ACL 2000), Hong Kong, China (2000)

    Google Scholar 

  10. Ratnaparkhi, A.: Trainable Methods for Surface Natural Language Generation. In: Proceedings of the 1st Annual North American Association of Computational Linguistics, NAACL 2000, Seattle,WA, pp. 194–201 (2000)

    Google Scholar 

  11. Charniak, E.: A maximum-entropy-inspired parser. In: Proceedings of the First Meeting of the North American Chapter of the Association for Computational Linguistics NAACL-2000, Seattle,Washington (2000)

    Google Scholar 

  12. Daumé, H., Knight, K., Langkilde-Geary, I., Marku, D., Yamada, K.: The importance of lexicalized syntax models for natural language generation tasks. In: Proceedings of the International Natural Language Generation Conference (INLG 2002), New York (2002)

    Google Scholar 

  13. Langkilde, I.: Forest-based statistical sentence generation. In: Association for Computational Linguistics conference, North American chapter, NAACL 2000 (2000)

    Google Scholar 

  14. Fellbaum, C.: WordNet: An Electronic Lexical Database. MIT Press, Cambridge (1998)

    MATH  Google Scholar 

  15. Habash, N.: Matador: A Large-Scale Spanish-English GHMT System. In: Proceedings of the Ninth Machine Translation Summit (MT SUMMIT IX), New Orleans, USA (2003)

    Google Scholar 

  16. Habash, N.: Generation-Heavy Machine Translation. In: Proceedings of the International Natural Language Generation Conference (INLG 2002), Student Session, New York (2002)

    Google Scholar 

  17. Habash, N.: Oxygen: A Language Independent Linearization Engine. In: White, J.S. (ed.) AMTA 2000. LNCS (LNAI), vol. 1934, pp. 68–79. Springer, Heidelberg (2000)

    Chapter  Google Scholar 

  18. Johnson, M.: Joint and Conditional Estimation of Tagging and Parsing Models. In: Proceedings of the 39th Annual Meeting of the Association for Computational Linguistics (ACL 2001), Toulouse, France (2001)

    Google Scholar 

  19. Aymerich, J.: Generation of Noun-Noun Compounds in the Spanish-English Machine Translation System SPANAM. In: Proceedings of the Eighth Machine Translation Summit (MT SUMMIT VIII), Santiago de Compostela, Spain (2001)

    Google Scholar 

  20. Tanaka, T., Baldwin, T.: Translation Selection for Japanese-English Noun-Noun Compounds. In: Proceedings of the Ninth Machine Translation Summit (MT SUMMIT IX), New Orleans, USA (2003)

    Google Scholar 

  21. Papineni, K., Roukos, S., Ward, T., Zhu, W.: Bleu: a Method for Automatic Evaluation of Machine Translation. Technical Report RC22176(W0109-022), IBM Research Division, Yorktown Heights, NY (2001)

    Google Scholar 

  22. Bangalore, S., Rambow, O., Whittaker, S.: Evaluation Metrics for Generation. In: Proceedings of the 1st International Conference on Natural Language Generation (INLG 2000), Mitzpe Ramon, Israel (2000)

    Google Scholar 

  23. Hovy, E.: MT Evaluation Bibliography. In: The ISLE Classification of Machine Translation Evaluations International Standards for Language Engineering (ISLE), Information Sciences Institute, Los Angeles (2000), http://www.isi.edu/natural-language/mteval/2e MT-bibliography.htm

    Google Scholar 

  24. Graff, D.: UN Parallel Text (Spanish-English), LDC Catalog No.: LDC94T4A Linguistic Data Consortium, University of Pennsylvania (1994)

    Google Scholar 

  25. Tapanainen, P., Jarvinen, T.: A non-projective dependency parser. In: 5th Conference on Applied Natural Language Processing / Association for Computational Linguistics, Washington, D.C. (1997)

    Google Scholar 

  26. Jinxi, X.: UN Parallel Text (Arabic-English), LDC Catalog No.: LDC2002E15 Linguistic Data Consortium, University of Pennsylvania (2002)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2004 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Habash, N. (2004). The Use of a Structural N-gram Language Model in Generation-Heavy Hybrid Machine Translation. In: Belz, A., Evans, R., Piwek, P. (eds) Natural Language Generation. INLG 2004. Lecture Notes in Computer Science(), vol 3123. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-27823-8_7

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-27823-8_7

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-22340-5

  • Online ISBN: 978-3-540-27823-8

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics