Phrase table re-adjustment for statistical machine translation

Banik, Debajyoty

doi:10.1007/s10772-020-09676-0

Phrase table re-adjustment for statistical machine translation

Published: 05 February 2020

Volume 24, pages 903–911, (2021)
Cite this article

International Journal of Speech Technology Aims and scope Submit manuscript

Debajyoty Banik¹

162 Accesses
9 Citations
Explore all metrics

Abstract

Neither assigning similar priority to all phrases nor pruning out the incorrect phrases from the phrase table can improve the accuracy of machine translation. In this paper, we present a novel method for weight re-adjustment of phrase table in a statistical machine translation system. It learns the correct and incorrect phrases from bilingual corpora. Based on the syntactic phrase-level information, phrase table is updated with the weights estimated using probability distribution. Evaluation on English–Hindi technical domain corpora shows that our proposed method is more effective in producing better output in terms of BLEU, RIBES and NIST metrics. We shows that the proposed methods works well for other language pairs like Hindi–Konkani and Bengali–Hindi. Finally, we realised that this minor probabilistic change can improve the accuracy of the machine translation system a lot.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Notes

References

Ananthakrishnan, R., Bhattacharyya, P., Sasikumar, M., & Shah, R. M. (2007). Some issues in automatic evaluation of english–hindi mt: more blues for bleu. ICON.
Ang, L. M., Seng, K. P., & Heng, T. Z. (2016). Information communication assistive technologies for visually impaired people. International Journal of Ambient Computing and Intelligence (IJACI), 7(1), 45–68.
Article Google Scholar
Bahdanau, D., Cho, K., & Bengio, Y. (2014). Neural machine translation by jointly learning to align and translate. arXiv preprint arXiv:1409.0473.
Banik, D., Bhattacharyya, P., & Ekbal, A. (2016a). Rule based hardware approach for machine transliteration: A first thought. In 2016 Sixth International Symposium on Embedded Computing and System Design (ISED) (pp. 192–195). IEEE.
Banik, D., Ekbal, A., & Bhattacharyya, P. (2018a). Machine learning based optimized pruning approach for decoding in statistical machine translation. IEEE Access, 7, 1736–1751.
Article Google Scholar
Banik, D., Ekbal, A., & Bhattacharyya, P. (2018b). Wuplebleu: The wordnet-based evaluation metric for machine translation. In 15th International Conference on Natural Language Processing (p. 104).
Banik, D., Ekbal, A., Bhattacharyya, P., & Bhattacharyya, S. (2019a). Assembling translations from multi-engine machine translation outputs. Applied Soft Computing, 78, 230–239.
Article Google Scholar
Banik, D., Ekbal, A., Bhattacharyya, P., Bhattacharyya, S., & Platos, J. (2019b). Statistical-based system combination approach to gain advantages over different machine translation systems. Heliyon, 5(9), e02504.
Article Google Scholar
Banik, D., Sen, S., Ekbal, A., & Bhattacharyya, P. (2016b). Can SMT and RBMT improve each other’s performance? An experiment with english-hindi translation. In Proceedings of the 13th International Conference on Natural Language Processing (pp. 10–19).
Bojar, O., Diatka, V., Rychlỳ, P., Stranák, P., Suchomel, V., Tamchyna, A., et al. (2014). Hindencorp-hindi-english and hindi-only corpus for machine translation. In LREC (pp. 3550–3555).
Callison-Burch, C., & Koehn, P. (2005). Introduction to statistical machine translation. Language, 1, 1.
Google Scholar
Collins, M., Koehn, P., & Kučerová, I. (2005). Clause restructuring for statistical machine translation. In Proceedings of the 43rd annual meeting on association for computational linguistics (pp. 531–540). Association for Computational Linguistics.
De Marneffe, M. C., MacCartney, B., Manning, C. D. et al. (2006). Generating typed dependency parses from phrase structure parses. In Proceedings of LREC (Vol. 6, pp. 449–454). Genoa.
Doddington, G. (2002). Automatic evaluation of machine translation quality using n-gram co-occurrence statistics. In Proceedings of the second international conference on Human Language Technology Research. Morgan Kaufmann Publishers Inc. (pp. 138–145).
Dwivedi, S. K., & Sukhadeve, Pramod P. (2010). Machine translation system in indian perspectives. Journal of Computer Science, 6(10), 1111.
Article Google Scholar
Forcada, M. L., Ginestí-Rosell, M., Nordfalk, J., O’Regan, J., Ortiz-Rojas, S., Pérez-Ortiz, J. A., et al. (2011). Apertium: A free/open-source platform for rule-based machine translation. Machine Translation, 25(2), 127–144.
Article Google Scholar
Genzel, D. (2010). Automatically learning source-side reordering rules for large scale machine translation. In Proceedings of the 23rd International Conference on Computational Linguistics (Coling 2010) (pp. 376–384).
Gurevych, I., & Mühlhäuser, M. (2007). Natural language processing for ambient intelligence. KI, 21(2), 10–16.
Google Scholar
Gutierrez, C. E., Alsharif, M. R., Yamashita, K., & Khosravy, M. (2014a). A tweets mining approach to detection of critical events characteristics using random forest. International Journal of Next-Generation Computing, 5(2), 167–176.
Google Scholar
Gutierrez, C. E., Alsharif, P. M. R., Khosravy, M., Yamashita, P. K., Miyagi, P. H., & Villa, R. (2014b). Main large data set features detection by a linear predictor model. In AIP Conference Proceedings (Vol. 1618, pp. 733–737). AIP.
Hanneman, G., & Lavie, A. (2009). Decoding with syntactic and non-syntactic phrases in a syntax-based machine translation system. In Proceedings of the Third Workshop on Syntax and Structure in Statistical Translation (pp. 1–9). Association for Computational Linguistics.
Himabindu, K., Morusupalli, R., Dey, N., & Rao, C. R. (2019). Coefficient of variation and machine learning applications.
Jha, G. N. (2010). The TDIL program and the indian langauge corpora intitiative (ILCI). In LREC.
Kamran, A. (2013). Hybrid machine translation.
Karaa, W. B. A., & Dey, N. (2017). Mining multimedia documents. Boca Raton: Chapman and Hall/CRC.
Book Google Scholar
Koehn, P., Hoang, H., Birch, A., Callison-Burch, C., Federico, M., Bertoldi, N., et al. (2007). Moses: Open source toolkit for statistical machine translation. In Proceedings of the 45th annual meeting of the ACL on interactive poster and demonstration sessions (pp. 177–180). Association for Computational Linguistics.
Koehn, P., & Monz, C. (2006). Shared task: Exploiting parallel texts for statistical machine translation. In Proceedings of the NAACL 2006 workshop on statistical machine translation, New York City (June 2006).
Koehn, P., Och, F. J., & Marcu, D. (2003). Statistical phrase-based translation. In Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology (Vol. 1, pp. 48–54). Association for Computational Linguistics.
Marcu, D., & Wong, W. (2002). A phrase-based, joint probability model for statistical machine translation. In Proceedings of the ACL-02 conference on Empirical methods in natural language processing (Vol. 10, pp. 133–139). Association for Computational Linguistics.
Neubig, G., Watanabe, T., & Mori, S.. (2012). Inducing a discriminative parser to optimize machine translation reordering. In Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (pp. 843–853). Association for Computational Linguistics.
Och, F. J., Tillmann, C., Ney, H., et al. (1999). Improved alignment models for statistical machine translation. In Proc. of the Joint SIGDAT Conf. on Empirical Methods in Natural Language Processing and Very Large Corpora (pp. 20–28).
Papineni, K., Roukos, S., Ward, T., & Zhu, W.-J. (2002). Bleu: A method for automatic evaluation of machine translation. In Proceedings of the 40th annual meeting on association for computational linguistics (pp. 311–318). Association for Computational Linguistics.
PVS, A. & Karthik, G. (2007). Part-of-speech tagging and chunking using conditional random fields and transformation based learning. Shallow Parsing for South Asian Languages 21.
Ramanathan, A., Hegde, J., Shah, R. M., Bhattacharyya, P., & Sasikumar, M. (2008). Simple syntactic and morphological processing can help english-hindi statistical machine translation. In IJCNLP (pp. 513–520).
Sen, S., Banik, D., Ekbal, A., & Bhattacharyya, P. (2016). Iitp english-hindi machine translation system at wat 2016. In Proceedings of the 3rd Workshop on Asian Translation (WAT2016) (pp. 216–222).
Singh, A., Dey, N., Ashour, A. S., & Santhi, V. (2017). Web semantics for textual and visual information retrieval. IGI Global.
Wang, R., & Wang, G. (2019). Web text categorization based on statistical merging algorithm in big data environment. International Journal of Ambient Computing and Intelligence (IJACI), 10(3), 17–32.
Article Google Scholar
Yamada, K., & Knight, K. (2001). A syntax-based statistical translation model. In Proceedings of the 39th Annual Meeting on Association for Computational Linguistics (pp. 523–530). Association for Computational Linguistics.

Download references

Author information

Authors and Affiliations

School of Computer Engineering, Kalinga Institute of Industrial Technology, Bhubaneswar, India
Debajyoty Banik

Authors

Debajyoty Banik
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Debajyoty Banik.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Banik, D. Phrase table re-adjustment for statistical machine translation. Int J Speech Technol 24, 903–911 (2021). https://doi.org/10.1007/s10772-020-09676-0

Download citation

Received: 15 November 2019
Accepted: 13 January 2020
Published: 05 February 2020
Issue Date: December 2021
DOI: https://doi.org/10.1007/s10772-020-09676-0

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Phrase table re-adjustment for statistical machine translation

Abstract

Access this article

Similar content being viewed by others

Statistical machine translation based on weighted syntax–semantics

Phrase Filtering for Content Words in Hierarchical Phrase-Based Model

Beam-Width Adaptation for Hierarchical Phrase-Based Translation

Notes

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Phrase table re-adjustment for statistical machine translation

Abstract

Access this article

Similar content being viewed by others

Statistical machine translation based on weighted syntax–semantics

Phrase Filtering for Content Words in Hierarchical Phrase-Based Model

Beam-Width Adaptation for Hierarchical Phrase-Based Translation

Notes

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation