Skip to main content
Log in

Phrase table re-adjustment for statistical machine translation

  • Published:
International Journal of Speech Technology Aims and scope Submit manuscript

Abstract

Neither assigning similar priority to all phrases nor pruning out the incorrect phrases from the phrase table can improve the accuracy of machine translation. In this paper, we present a novel method for weight re-adjustment of phrase table in a statistical machine translation system. It learns the correct and incorrect phrases from bilingual corpora. Based on the syntactic phrase-level information, phrase table is updated with the weights estimated using probability distribution. Evaluation on English–Hindi technical domain corpora shows that our proposed method is more effective in producing better output in terms of BLEU, RIBES and NIST metrics. We shows that the proposed methods works well for other language pairs like Hindi–Konkani and Bengali–Hindi. Finally, we realised that this minor probabilistic change can improve the accuracy of the machine translation system a lot.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1

Similar content being viewed by others

Notes

  1. HT: Main padne wala hun.

  2. http://nlp.stanford.edu:8080/parser/.

  3. http://ltrc.iiit.ac.in/showfile.php?filename=downloads/shallow_parser.php.

  4. https://www.ling.upenn.edu/courses/Fall_2003/ling001/penn_treebank_pos.html.

References

  • Ananthakrishnan, R., Bhattacharyya, P., Sasikumar, M., & Shah, R. M. (2007). Some issues in automatic evaluation of english–hindi mt: more blues for bleu. ICON.

  • Ang, L. M., Seng, K. P., & Heng, T. Z. (2016). Information communication assistive technologies for visually impaired people. International Journal of Ambient Computing and Intelligence (IJACI), 7(1), 45–68.

    Article  Google Scholar 

  • Bahdanau, D., Cho, K., & Bengio, Y. (2014). Neural machine translation by jointly learning to align and translate. arXiv preprint arXiv:1409.0473.

  • Banik, D., Bhattacharyya, P., & Ekbal, A. (2016a). Rule based hardware approach for machine transliteration: A first thought. In 2016 Sixth International Symposium on Embedded Computing and System Design (ISED) (pp. 192–195). IEEE.

  • Banik, D., Ekbal, A., & Bhattacharyya, P. (2018a). Machine learning based optimized pruning approach for decoding in statistical machine translation. IEEE Access, 7, 1736–1751.

    Article  Google Scholar 

  • Banik, D., Ekbal, A., & Bhattacharyya, P. (2018b). Wuplebleu: The wordnet-based evaluation metric for machine translation. In 15th International Conference on Natural Language Processing (p. 104).

  • Banik, D., Ekbal, A., Bhattacharyya, P., & Bhattacharyya, S. (2019a). Assembling translations from multi-engine machine translation outputs. Applied Soft Computing, 78, 230–239.

    Article  Google Scholar 

  • Banik, D., Ekbal, A., Bhattacharyya, P., Bhattacharyya, S., & Platos, J. (2019b). Statistical-based system combination approach to gain advantages over different machine translation systems. Heliyon, 5(9), e02504.

    Article  Google Scholar 

  • Banik, D., Sen, S., Ekbal, A., & Bhattacharyya, P. (2016b). Can SMT and RBMT improve each other’s performance? An experiment with english-hindi translation. In Proceedings of the 13th International Conference on Natural Language Processing (pp. 10–19).

  • Bojar, O., Diatka, V., Rychlỳ, P., Stranák, P., Suchomel, V., Tamchyna, A., et al. (2014). Hindencorp-hindi-english and hindi-only corpus for machine translation. In LREC (pp. 3550–3555).

  • Callison-Burch, C., & Koehn, P. (2005). Introduction to statistical machine translation. Language, 1, 1.

    Google Scholar 

  • Collins, M., Koehn, P., & Kučerová, I. (2005). Clause restructuring for statistical machine translation. In Proceedings of the 43rd annual meeting on association for computational linguistics (pp. 531–540). Association for Computational Linguistics.

  • De Marneffe, M. C., MacCartney, B., Manning, C. D. et al. (2006). Generating typed dependency parses from phrase structure parses. In Proceedings of LREC (Vol. 6, pp. 449–454). Genoa.

  • Doddington, G. (2002). Automatic evaluation of machine translation quality using n-gram co-occurrence statistics. In Proceedings of the second international conference on Human Language Technology Research. Morgan Kaufmann Publishers Inc. (pp. 138–145).

  • Dwivedi, S. K., & Sukhadeve, Pramod P. (2010). Machine translation system in indian perspectives. Journal of Computer Science, 6(10), 1111.

    Article  Google Scholar 

  • Forcada, M. L., Ginestí-Rosell, M., Nordfalk, J., O’Regan, J., Ortiz-Rojas, S., Pérez-Ortiz, J. A., et al. (2011). Apertium: A free/open-source platform for rule-based machine translation. Machine Translation, 25(2), 127–144.

    Article  Google Scholar 

  • Genzel, D. (2010). Automatically learning source-side reordering rules for large scale machine translation. In Proceedings of the 23rd International Conference on Computational Linguistics (Coling 2010) (pp. 376–384).

  • Gurevych, I., & Mühlhäuser, M. (2007). Natural language processing for ambient intelligence. KI, 21(2), 10–16.

    Google Scholar 

  • Gutierrez, C. E., Alsharif, M. R., Yamashita, K., & Khosravy, M. (2014a). A tweets mining approach to detection of critical events characteristics using random forest. International Journal of Next-Generation Computing, 5(2), 167–176.

    Google Scholar 

  • Gutierrez, C. E., Alsharif, P. M. R., Khosravy, M., Yamashita, P. K., Miyagi, P. H., & Villa, R. (2014b). Main large data set features detection by a linear predictor model. In AIP Conference Proceedings (Vol. 1618, pp. 733–737). AIP.

  • Hanneman, G., & Lavie, A. (2009). Decoding with syntactic and non-syntactic phrases in a syntax-based machine translation system. In Proceedings of the Third Workshop on Syntax and Structure in Statistical Translation (pp. 1–9). Association for Computational Linguistics.

  • Himabindu, K., Morusupalli, R., Dey, N., & Rao, C. R. (2019). Coefficient of variation and machine learning applications.

  • Jha, G. N. (2010). The TDIL program and the indian langauge corpora intitiative (ILCI). In LREC.

  • Kamran, A. (2013). Hybrid machine translation.

  • Karaa, W. B. A., & Dey, N. (2017). Mining multimedia documents. Boca Raton: Chapman and Hall/CRC.

    Book  Google Scholar 

  • Koehn, P., Hoang, H., Birch, A., Callison-Burch, C., Federico, M., Bertoldi, N., et al. (2007). Moses: Open source toolkit for statistical machine translation. In Proceedings of the 45th annual meeting of the ACL on interactive poster and demonstration sessions (pp. 177–180). Association for Computational Linguistics.

  • Koehn, P., & Monz, C. (2006). Shared task: Exploiting parallel texts for statistical machine translation. In Proceedings of the NAACL 2006 workshop on statistical machine translation, New York City (June 2006).

  • Koehn, P., Och, F. J., & Marcu, D. (2003). Statistical phrase-based translation. In Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology (Vol. 1, pp. 48–54). Association for Computational Linguistics.

  • Marcu, D., & Wong, W. (2002). A phrase-based, joint probability model for statistical machine translation. In Proceedings of the ACL-02 conference on Empirical methods in natural language processing (Vol. 10, pp. 133–139). Association for Computational Linguistics.

  • Neubig, G., Watanabe, T., & Mori, S.. (2012). Inducing a discriminative parser to optimize machine translation reordering. In Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (pp. 843–853). Association for Computational Linguistics.

  • Och, F. J., Tillmann, C., Ney, H., et al. (1999). Improved alignment models for statistical machine translation. In Proc. of the Joint SIGDAT Conf. on Empirical Methods in Natural Language Processing and Very Large Corpora (pp. 20–28).

  • Papineni, K., Roukos, S., Ward, T., & Zhu, W.-J. (2002). Bleu: A method for automatic evaluation of machine translation. In Proceedings of the 40th annual meeting on association for computational linguistics (pp. 311–318). Association for Computational Linguistics.

  • PVS, A. & Karthik, G. (2007). Part-of-speech tagging and chunking using conditional random fields and transformation based learning. Shallow Parsing for South Asian Languages 21.

  • Ramanathan, A., Hegde, J., Shah, R. M., Bhattacharyya, P., & Sasikumar, M. (2008). Simple syntactic and morphological processing can help english-hindi statistical machine translation. In IJCNLP (pp. 513–520).

  • Sen, S., Banik, D., Ekbal, A., & Bhattacharyya, P. (2016). Iitp english-hindi machine translation system at wat 2016. In Proceedings of the 3rd Workshop on Asian Translation (WAT2016) (pp. 216–222).

  • Singh, A., Dey, N., Ashour, A. S., & Santhi, V. (2017). Web semantics for textual and visual information retrieval. IGI Global.

  • Wang, R., & Wang, G. (2019). Web text categorization based on statistical merging algorithm in big data environment. International Journal of Ambient Computing and Intelligence (IJACI), 10(3), 17–32.

    Article  Google Scholar 

  • Yamada, K., & Knight, K. (2001). A syntax-based statistical translation model. In Proceedings of the 39th Annual Meeting on Association for Computational Linguistics (pp. 523–530). Association for Computational Linguistics.

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Debajyoty Banik.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Banik, D. Phrase table re-adjustment for statistical machine translation. Int J Speech Technol 24, 903–911 (2021). https://doi.org/10.1007/s10772-020-09676-0

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10772-020-09676-0

Keywords

Navigation