Skip to main content

Role of Paraphrases in PB-SMT

  • Conference paper

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 8404))

Abstract

Statistical Machine Translation (SMT) delivers a convenient format for representing how translation process is modeled. The translations of words or phrases are generally computed based on their occurrence in some bilingual training corpus. However, SMT still suffers for out of vocabulary (OOV) words and less frequent words especially when only limited training data are available or training and test data are in different domains. In this paper, we propose a convenient way to handle OOV and rare words using paraphrasing technique. Initially we extract paraphrases from bilingual training corpus with the help of comparable corpora. The extracted paraphrases are analyzed by conditionally checking the association of their monolingual distribution. Bilingual aligned paraphrases are incorporated as additional training data into the PB-SMT system. Integration of paraphrases into PB-SMT system results in significant improvement.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Iordanskaja, L., Kittredge, R., Polguere, A.: Lexical Selection and Paraphrase in a Meaning-Text Generation Model. In: Paris, C.L., et al. (eds.) Natural Language Generation in Artificial Intelligence and Computational Linguistic, pp. 293–312. Kluwer Academic Publishers, Dordrecht (1991)

    Chapter  Google Scholar 

  2. Callison-Burch, C., Koehn, P., Osborne, M.: Improved Statistical Machine Translation Using Paraphrases. In: The Main Conference on Human Language Technology Conference of the North American Chapter of the Association of Computational Linguistics, HLT-NAACL, pp. 17–24 (2006)

    Google Scholar 

  3. Denoual, E., Lepage, Y.: BLEU in characters: towards automatic MT evaluation in languages without word delimiters. In: The Second International Joint Conference on Natural Language Processing, pp. 81–86 (2005)

    Google Scholar 

  4. Kauchak, D., Barzilay, R.: Paraphrasing for automatic evaluation. In: The Main Conference on Human Language Technology Conference of the North American Chapter of the Association of Computational Linguistics (2006)

    Google Scholar 

  5. Heilman, M., Smith, N.A.: Tree edit models for recognizing textual entailments, paraphrases, and answers to questions. In: HLT 2010 Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics, pp. 1011–1019 (2010)

    Google Scholar 

  6. Gupta, R., Pal, S., Bandyopadhyay, S.: Improving MT System Using Extracted Parallel Fragments of Text from Comparable Corpora. In: 6th Workshop of Building and Using Comparable Corpora (BUCC). ACL, Sofia (2013)

    Google Scholar 

  7. Bannard, C., Callison-Burch, C.: Paraphrasing with Bilingual Parallel Corpora. In: ACL (2005)

    Google Scholar 

  8. Pal, S., Naskar, S.K., Pecina, P., Bandyopadhyay, S., Way, A.: Handling Named Entities and Compound Verbs in Phrase-Based Statistical Machine Translation. In: COLING 2010 Workshop on Multiword Expressions: from Theory to Applications (MWE 2010), Beijing, China, pp. 45–53 (2010)

    Google Scholar 

  9. Shiqi, Z., Haifeng, W., Ting, L., Sheng, L.: Extracting Paraphrase Patterns from Bilingual Parallel Corpora. Natural Language Engineering 15(4), 503–526 (2009)

    Article  Google Scholar 

  10. Chan, T.P., Callison-Burch, C., Durme, B.V.: Reranking Bilingually Extracted Paraphrases Using Monolingual Distributional Similarity. In: GEometrical Models of Natural Language Semantics, GEMS (2011)

    Google Scholar 

  11. Aziz, W., Specia, L.: Multilingual WSD-like Constraints for Paraphrase Extraction. In: The Seventeenth Conference on Computational Natural Language Learning (CoNLL), Sofia, Bulgaria, pp. 202–211 (2013)

    Google Scholar 

  12. Barzilay, R., McKeown, K.R.: Extracting paraphrases from a parallel corpus. In: 39th Annual Meeting on Association for Computational Linguistics, pp. 50–57 (2001)

    Google Scholar 

  13. Xu, W., Ritter, A., Grishman, R.: Gathering and Generating Paraphrases from Twitter with Application to Normalization. In: ACL 2013 Workshop on Building and Using Comparable Corpora (2013)

    Google Scholar 

  14. Wang, R., Callison-Burch, C.: Paraphrase Fragment Extraction from Monolingual Compa-rable Corpora. In: Fourth Workshop on Building and Using Comparable Corpora, BUCC (2011)

    Google Scholar 

  15. Kuhn, R., Chen, C., Foster, G., Stratford, E.: Phrase Clustering for Smoothing TM Prob-abilities – or, How to Extract Paraphrases from Phrase Tables. In: COLING, Beijing, China (2010)

    Google Scholar 

  16. Fujita, A., Carpuat, M.: FUN-NRC: Paraphrase-augmented Phrase-based SMT Systems for NTCIR-10 PatentMT. In: The 10th NTCIR Conference, Tokyo, Japan, June 18-21 (2013)

    Google Scholar 

  17. Madnani, N., Ayan, N.F., Resnik, P., Dorr, B.J.: Using Paraphrases for Parameter Tuning in Statistical Machine Translation. In: The Second Workshop on Statistical Machine Translation, StatMT (2007)

    Google Scholar 

  18. Marton, Y., Callison-Burch, C., Resnik, P.: Improved Statistical Machine Translation Using Monolingually-Derived Paraphrases. In: The 2009 Conference on Empirical Methods in Natural Language Processing, EMNLP (2009)

    Google Scholar 

  19. Mehay, D.N., White, M.: Shallow and Deep Paraphrasing for Improved Machine Translation Parameter Optimization. In: The AMTA 2012 Workshop on Monolingual Machine Translation, MONOMT (2012)

    Google Scholar 

  20. Razmara, M., Siahbani, M., Haffari, G., Sarkar, A.: Graph Propagation for Paraphrasing Out-of-Vocabulary Words in Statistical Machine Translation. In: ACL (2013)

    Google Scholar 

  21. Church, K.W., Hanks, P.: Word association norms, mutual information, and lexicography. Journal on Computational Linguistics Archive 16(1), 22–29 (1990)

    Google Scholar 

  22. Dunning, T.: Accurate methods for the statistics of surprise and coincidence. Journal on Computational Linguistics - Special Issue on Using Large Corpora: I Archive 19(1), 61–74 (1993)

    Google Scholar 

  23. Phan, X.H.: Crfchunker: Crfenglish phrase chunker. In: PACLIC (2006)

    Google Scholar 

  24. Koehn, P., Och, F.J., Marcu, D.: Statistical phrase-based translation. In: HLT-NAACL, pp. 127–133 (2003)

    Google Scholar 

  25. Och, F.J.: Minimum Error Rate Training in Statistical Machine Translation. In: ACL (2003)

    Google Scholar 

  26. Stolcke, A.: SRILM - An Extensible Language Modeling Toolkit. In: International Conferance on Spoken Language Processing, vol. 2, pp. 901–904. Denver (2002)

    Google Scholar 

  27. Kneser, R., Ney, H.: Improved backing-off for M-gram language modeling. In: International Conference on Acoustics, Speech, and Signal Processing, ICASSP (1995)

    Google Scholar 

  28. Koehn, P., Hoang, H., Birch, A., Callison-Burch, C., Federico, M., Bertoldi, N., Cowan, B., Shen, W., Moran, C., Zens, R., Dyer, C., Bojar, O., Constantin, A., Herbst, E.: Moses: Open Source Toolkit for Statistical Machine Translation. In: ACL (2007)

    Google Scholar 

  29. Doddington, G.: Automatic evaluation of machine translation quality using n-gram cooccurrence statistics. In: Human Language Technology Conference, HLT, San Diego, CA, pp. 128–132 (2002)

    Google Scholar 

  30. Papineni, K., Roukos, S., Ward, T., Zhu., W.J.: BLEU: a method for automatic evaluation of machine translation. In: ACL (2002)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2014 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Pal, S., Lohar, P., Naskar, S.K. (2014). Role of Paraphrases in PB-SMT. In: Gelbukh, A. (eds) Computational Linguistics and Intelligent Text Processing. CICLing 2014. Lecture Notes in Computer Science, vol 8404. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-54903-8_21

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-54903-8_21

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-54902-1

  • Online ISBN: 978-3-642-54903-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics