Skip to main content

Improving Conversational Spoken Language Machine Translation via Pronoun Recovery

  • Conference paper
  • First Online:
Book cover Social Media Processing (SMP 2015)

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 568))

Included in the following conference series:

Abstract

Machine translation for social communication is necessary in daily life. However, spoken language translation faces many challenges especially in the translation of zero pronouns which is absent in the source language but appear in the target language. Dropping of pronouns severely affects the machine translation from pronoun dropped language such as Chinese to other languages. This phenomenon occurs more frequently in the conversational spoken language. In order to solve this problem, we insert the position of missing pronouns into the source side, then we use the word alignment method to filter the pronouns in order to pick up the pronouns which are really helpful for the machine translation. We achieve improvement on the translation of chat, message and telephone conversational corpus.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    This corpus comes from the DARPA Broad Operational Language Translation (BOLT) Program which includes message, chat,and telephone conversation parallel data sets The website is https://www.ldc.upenn.edu/sites/www.ldc.upenn.edu/files/bolt_1.pdf.

References

  1. Chung, T., Gildea, D.: Effects of empty categories on machine translation. In: Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing (2010)

    Google Scholar 

  2. Wang, H., Gao, W., Li, S.: Speech machine translation research review. Comput. Sci. 5, 47–50 (1998)

    Google Scholar 

  3. Hoang, H., Birch, A., Callison-Burch, C., Zens, R., Federico, M., Bertoldi, N., Dyer, C., Cowan, B., Shen, W., Moran, C.: Moses: open source toolkit for statistical machine translation. Proc. Assoc. Comput. Linguist. 9(1), 177–180 (2007)

    Google Scholar 

  4. Chen, C., Ng, V.: Chinese zero pronoun resolution: some recent advances. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, pp. 1360–1365 (2013)

    Google Scholar 

  5. Guillou, L.: Improving pronoun translation for statistical machine translation. In: Proceedings of the Student Research Workshop at the 13th Conference of the European Chapter of the Association for Computational Linguistics, pp. 1–10 (2012)

    Google Scholar 

  6. Yamada, K., Knight, K.: A syntax-based statistical translation model. In: Proceedings of the 39th Annual Meeting on Association for Computational Linguistics, pp. 523–530 (2001)

    Google Scholar 

  7. Xiang, B., Luo, X., Zhou, B.: Enlisting the ghost: modeling empty categories for machine translation. In: ACL, pp. 822–831 (2013)

    Google Scholar 

  8. Xue, N., et al.: Chinese Treebank 8.0 LDC2013T21. Linguistic Data Consortium, Philadelphia (2013)

    Google Scholar 

  9. Lafferty, J., Mccallum, A., Pereira, F.: Conditional random fields: probabilistic models for segmenting and labeling sequence data. In: Proceedings of the 18th International Conference on Machine Learning 2001 (ICML 2001), pp. 282–289 (2001)

    Google Scholar 

  10. Vogel, S., Ney, H., Tillmann, C.: HMM-based word alignment in statistical translation. In: Proceedings of the 16th Conference on Computational Linguistics, vol. 2, pp. 836–841 (1996)

    Google Scholar 

  11. Che, W., Li, Z., Liu, T.: LTP: a chinese language technology platform. In: Proceedings of the 23rd International Conference on Computational Linguistics, pp. 13–16 (2010)

    Google Scholar 

  12. Chiang, D.: A hierarchical phrase-based model for statistical machine translation. In: Proceedings of the 43rd Annual Meeting of the Association for Computational Linguistics, pp. 263–270 (2005)

    Google Scholar 

  13. Chen, S.F., Goodman, J.: An empirical study of smoothing techniques for language modeling. Comput. Speech Lang. 13(4), 359–393 (1999)

    Article  Google Scholar 

  14. Och, F.J., Ney, H.: A systematic comparison of various statistical alignment models. Comput. Linguist. 29(1), 19–51 (2003)

    Article  MATH  Google Scholar 

  15. Och, F.J.: Minimum error rate training in statistical machine translation. In: Proceedings of the 41th Annual Meeting on Association for Computational Linguistics, vol. 32, no. 17, pp. 701–711 (2003)

    Google Scholar 

  16. Papineni, K., Roukos, S., Ward, T., Zhu, W.J.: BLEU: a method for automatic evaluation of machine translation. In: Proceedings of the 40th Annual Meeting on Association for Computational Linguistics, pp. 311–318 (2002)

    Google Scholar 

  17. Koehn, P.: Statistical significance tests for machine translation evaluation. In: EMNLP, pp. 388–395. Citeseer (2004)

    Google Scholar 

Download references

Acknowledgments

This work is supported by the National Basic Research Program of China (973 Program, Grant No. 2013CB329303), the National Natural Science Foundation of China (Grant No. 61132009, 61202244) and Beijing Institute of Technology Research Fund Program for Young Scholars.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yanlin Hu .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2015 Springer Science+Business Media Singapore

About this paper

Cite this paper

Hu, Y., Huang, H., Jian, P., Guo, Y. (2015). Improving Conversational Spoken Language Machine Translation via Pronoun Recovery. In: Zhang, X., Sun, M., Wang, Z., Huang, X. (eds) Social Media Processing. SMP 2015. Communications in Computer and Information Science, vol 568. Springer, Singapore. https://doi.org/10.1007/978-981-10-0080-5_20

Download citation

  • DOI: https://doi.org/10.1007/978-981-10-0080-5_20

  • Published:

  • Publisher Name: Springer, Singapore

  • Print ISBN: 978-981-10-0079-9

  • Online ISBN: 978-981-10-0080-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics