Skip to main content

Abstract

Recent work has shown the effectiveness of neural probabilistic language models(NPLMs) in statistical machine translation(SMT) through both reranking the n-best outputs and direct decoding. However there are still some issues remained for application of NPLMs. In this paper we further investigate through detailed experiments and extension of state-of-art NPLMs. Our experiments on large-scale datasets show that our final setting, i.e., decoding with conventional n-gram LMs plus un-normalized feedforward NPLMs extended with word clusters could significantly improve the translation performance by up to averaged 1.1 Bleu on four test datasets, while decoding time is acceptable. And results also show that current NPLMs, including feedforward and RNN still cannot simply replace n-gram LMs for SMT.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Auli, M., Galley, M., Quirk, C., Zweig, G.: Joint language and translation modeling with recurrent neural networks. In: Proceedings of the 2013 Conference on EMNLP, pp. 1044–1054 (2013)

    Google Scholar 

  2. Bengio, Y., Ducharme, R., Vincent, P., Jauvin, C.: A neural probabilistic language model. Journal of Machine Learning Research (2003)

    Google Scholar 

  3. Brown, P.F., Desouza, P.V., Mercer, R.L., Pietra, V.J.D., Lai, J.C.: Class-based n-gram models of natural language. Computational Linguistics 18(4), 467–479 (1992)

    Google Scholar 

  4. Chen, S.F., Goodman, J.: An empirical study of smoothing techniques for language modeling. Tech. Rep. TR-10-98, Harvard University Center for Research in Computing Technology (1998)

    Google Scholar 

  5. Chiang, D.: Hierarchical phrase-based translation. Computational Linguistics 33(2), 201–228 (2007)

    Article  MATH  Google Scholar 

  6. Clark, J.H., Dyer, C., Lavie, A., Smith, N.A.: Better Hypothesis Testing for Statistical Machine Translation: Controlling for Optimizer Instability. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, Portland, Oregon, USA, pp. 176–181. Association for Computational Linguistics (June 2011), http://www.aclweb.org/anthology/P11-2031

  7. Devlin, J., Zbib, R., Huang, Z., Lamar, T., Schwartz, R., Makhoul, J.: Fast and robust neural network joint models for statistical machine translation. In: Proceedings of the ACL, Association for Computational Linguistics, Baltimore (2014)

    Google Scholar 

  8. Gutmann, M., Hyvärinen, A.: Noise-contrastive estimation: A new estimation principle for unnormalized statistical models. In: Proceedings of AISTATS (2010)

    Google Scholar 

  9. Le, H.S., Allauzen, A., Yvon, F.: Measuring the influence of long range dependencies with neural network language models. In: Proceedings of the NAACL-HLT 2012 Workshop: Will We Ever Really Replace the N-gram Model? On the Future of Language Modeling for HLT, Montréal, Canada (2012)

    Google Scholar 

  10. Mikolov, T.: Statistical Language Models Based on Neural Networks. Ph.D. thesis, Brno University of Technology (2012)

    Google Scholar 

  11. Mikolov, T., Deoras, A., Kombrink, S., Burget, L., Černocký, J.H.: Empirical evaluation and combination of advanced language modeling techniques. In: Proceedings of INTERSPEECH, pp. 605–608 (2011)

    Google Scholar 

  12. Mikolov, T., Karafiát, M., Burget, L., Černocký, J.H., Khudanpur, S.: Recurrent neural network based language model. In: Proceedings of INTERSPEECH (2010)

    Google Scholar 

  13. Mnih, A., Hinton, G.: Three new graphical models for statistical language modelling. In: Proceedings of ICML (2007)

    Google Scholar 

  14. Mnih, A., Hinton, G.: A scalable hierarchical distributed language model. In: NIPS (2009)

    Google Scholar 

  15. Mnih, A., Teh, Y.W.: A fast and simple algorithm for training neural probabilistic language models. In: Proceedings of the 29th ICML, pp. 1751–1758 (2012)

    Google Scholar 

  16. Nair, V., Hinton, G.E.: Rectified linear units improve restricted Boltzmann machines. In: Proceedings of ICML, pp. 807–814 (2010)

    Google Scholar 

  17. Niehues, J., Waibel, A.: Continuous space language models using Restricted Boltzmann Machines. In: Proceedings of IWSLT (2012)

    Google Scholar 

  18. Och, F.J.: Minimum error rate training in statistical machine translation. In: Proceedings of ACL, pp. 160–167 (2003)

    Google Scholar 

  19. Och, F.J., Ney, H.: A systematic comparison of various statistical alignment models. Computational Linguistics 29(1), 19–51 (2003)

    Article  MATH  Google Scholar 

  20. Ramabhadran, B., Khudanpur, S., Arisoy, E. (eds.): Proceedings of the NAACL-HLT 2012 Workshop: Will We Ever Really Replace the N-gram Model? On the Future of Language Modeling for HLT, Montréal, Canada (June 2012)

    Google Scholar 

  21. Schwenk, H.: Continuous space language models. Computer Speech and Language 21, 492–518 (2007)

    Article  Google Scholar 

  22. Schwenk, H.: Continuous-space language models for statistical machine translation. Prague Bulletin of Mathematical Linguistics 93, 137–146 (2010)

    Article  Google Scholar 

  23. Sundermeyer, M., Oparin, I., Gauvain, J.L., Freiberg, B., Schluter, R., Ney, H.: Comparison of feedforward and recurrent neural network language models. In: Proceedings of ICASSP (2013)

    Google Scholar 

  24. Vaswani, A., Zhao, Y., Fossum, V., Chiang, D.: Decoding with large-scale neural language models improves translation. In: Proceedings of the 2013 Conference on EMNLP, pp. 1387–1392.

    Google Scholar 

  25. Wu, Y., Lu, X., Yamamoto, H., Matsuda, S., Hori, C., Kashioka, H.: Factored language model based on recurrent neural network. In: Proceedings of COLING 2012, Mumbai, India, pp. 2835–2850 (December 2012)

    Google Scholar 

  26. Wuebker, J., Peitz, S., Rietig, F., Ney, H.: Improving statistical machine translation with word class models. In: Proceedings of EMNLP, pp. 1377–1381 (2013)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2014 Springer International Publishing Switzerland

About this paper

Cite this paper

Zhao, Y., Huang, S., Chen, H., Chen, J. (2014). An Investigation on Statistical Machine Translation with Neural Language Models. In: Sun, M., Liu, Y., Zhao, J. (eds) Chinese Computational Linguistics and Natural Language Processing Based on Naturally Annotated Big Data. NLP-NABD CCL 2014 2014. Lecture Notes in Computer Science(), vol 8801. Springer, Cham. https://doi.org/10.1007/978-3-319-12277-9_16

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-12277-9_16

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-12276-2

  • Online ISBN: 978-3-319-12277-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics