Abstract
Neither assigning similar priority to all phrases nor pruning out the incorrect phrases from the phrase table can improve the accuracy of machine translation. In this paper, we present a novel method for weight re-adjustment of phrase table in a statistical machine translation system. It learns the correct and incorrect phrases from bilingual corpora. Based on the syntactic phrase-level information, phrase table is updated with the weights estimated using probability distribution. Evaluation on English–Hindi technical domain corpora shows that our proposed method is more effective in producing better output in terms of BLEU, RIBES and NIST metrics. We shows that the proposed methods works well for other language pairs like Hindi–Konkani and Bengali–Hindi. Finally, we realised that this minor probabilistic change can improve the accuracy of the machine translation system a lot.
Similar content being viewed by others
References
Ananthakrishnan, R., Bhattacharyya, P., Sasikumar, M., & Shah, R. M. (2007). Some issues in automatic evaluation of english–hindi mt: more blues for bleu. ICON.
Ang, L. M., Seng, K. P., & Heng, T. Z. (2016). Information communication assistive technologies for visually impaired people. International Journal of Ambient Computing and Intelligence (IJACI), 7(1), 45–68.
Bahdanau, D., Cho, K., & Bengio, Y. (2014). Neural machine translation by jointly learning to align and translate. arXiv preprint arXiv:1409.0473.
Banik, D., Bhattacharyya, P., & Ekbal, A. (2016a). Rule based hardware approach for machine transliteration: A first thought. In 2016 Sixth International Symposium on Embedded Computing and System Design (ISED) (pp. 192–195). IEEE.
Banik, D., Ekbal, A., & Bhattacharyya, P. (2018a). Machine learning based optimized pruning approach for decoding in statistical machine translation. IEEE Access, 7, 1736–1751.
Banik, D., Ekbal, A., & Bhattacharyya, P. (2018b). Wuplebleu: The wordnet-based evaluation metric for machine translation. In 15th International Conference on Natural Language Processing (p. 104).
Banik, D., Ekbal, A., Bhattacharyya, P., & Bhattacharyya, S. (2019a). Assembling translations from multi-engine machine translation outputs. Applied Soft Computing, 78, 230–239.
Banik, D., Ekbal, A., Bhattacharyya, P., Bhattacharyya, S., & Platos, J. (2019b). Statistical-based system combination approach to gain advantages over different machine translation systems. Heliyon, 5(9), e02504.
Banik, D., Sen, S., Ekbal, A., & Bhattacharyya, P. (2016b). Can SMT and RBMT improve each other’s performance? An experiment with english-hindi translation. In Proceedings of the 13th International Conference on Natural Language Processing (pp. 10–19).
Bojar, O., Diatka, V., Rychlỳ, P., Stranák, P., Suchomel, V., Tamchyna, A., et al. (2014). Hindencorp-hindi-english and hindi-only corpus for machine translation. In LREC (pp. 3550–3555).
Callison-Burch, C., & Koehn, P. (2005). Introduction to statistical machine translation. Language, 1, 1.
Collins, M., Koehn, P., & Kučerová, I. (2005). Clause restructuring for statistical machine translation. In Proceedings of the 43rd annual meeting on association for computational linguistics (pp. 531–540). Association for Computational Linguistics.
De Marneffe, M. C., MacCartney, B., Manning, C. D. et al. (2006). Generating typed dependency parses from phrase structure parses. In Proceedings of LREC (Vol. 6, pp. 449–454). Genoa.
Doddington, G. (2002). Automatic evaluation of machine translation quality using n-gram co-occurrence statistics. In Proceedings of the second international conference on Human Language Technology Research. Morgan Kaufmann Publishers Inc. (pp. 138–145).
Dwivedi, S. K., & Sukhadeve, Pramod P. (2010). Machine translation system in indian perspectives. Journal of Computer Science, 6(10), 1111.
Forcada, M. L., Ginestí-Rosell, M., Nordfalk, J., O’Regan, J., Ortiz-Rojas, S., Pérez-Ortiz, J. A., et al. (2011). Apertium: A free/open-source platform for rule-based machine translation. Machine Translation, 25(2), 127–144.
Genzel, D. (2010). Automatically learning source-side reordering rules for large scale machine translation. In Proceedings of the 23rd International Conference on Computational Linguistics (Coling 2010) (pp. 376–384).
Gurevych, I., & Mühlhäuser, M. (2007). Natural language processing for ambient intelligence. KI, 21(2), 10–16.
Gutierrez, C. E., Alsharif, M. R., Yamashita, K., & Khosravy, M. (2014a). A tweets mining approach to detection of critical events characteristics using random forest. International Journal of Next-Generation Computing, 5(2), 167–176.
Gutierrez, C. E., Alsharif, P. M. R., Khosravy, M., Yamashita, P. K., Miyagi, P. H., & Villa, R. (2014b). Main large data set features detection by a linear predictor model. In AIP Conference Proceedings (Vol. 1618, pp. 733–737). AIP.
Hanneman, G., & Lavie, A. (2009). Decoding with syntactic and non-syntactic phrases in a syntax-based machine translation system. In Proceedings of the Third Workshop on Syntax and Structure in Statistical Translation (pp. 1–9). Association for Computational Linguistics.
Himabindu, K., Morusupalli, R., Dey, N., & Rao, C. R. (2019). Coefficient of variation and machine learning applications.
Jha, G. N. (2010). The TDIL program and the indian langauge corpora intitiative (ILCI). In LREC.
Kamran, A. (2013). Hybrid machine translation.
Karaa, W. B. A., & Dey, N. (2017). Mining multimedia documents. Boca Raton: Chapman and Hall/CRC.
Koehn, P., Hoang, H., Birch, A., Callison-Burch, C., Federico, M., Bertoldi, N., et al. (2007). Moses: Open source toolkit for statistical machine translation. In Proceedings of the 45th annual meeting of the ACL on interactive poster and demonstration sessions (pp. 177–180). Association for Computational Linguistics.
Koehn, P., & Monz, C. (2006). Shared task: Exploiting parallel texts for statistical machine translation. In Proceedings of the NAACL 2006 workshop on statistical machine translation, New York City (June 2006).
Koehn, P., Och, F. J., & Marcu, D. (2003). Statistical phrase-based translation. In Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology (Vol. 1, pp. 48–54). Association for Computational Linguistics.
Marcu, D., & Wong, W. (2002). A phrase-based, joint probability model for statistical machine translation. In Proceedings of the ACL-02 conference on Empirical methods in natural language processing (Vol. 10, pp. 133–139). Association for Computational Linguistics.
Neubig, G., Watanabe, T., & Mori, S.. (2012). Inducing a discriminative parser to optimize machine translation reordering. In Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (pp. 843–853). Association for Computational Linguistics.
Och, F. J., Tillmann, C., Ney, H., et al. (1999). Improved alignment models for statistical machine translation. In Proc. of the Joint SIGDAT Conf. on Empirical Methods in Natural Language Processing and Very Large Corpora (pp. 20–28).
Papineni, K., Roukos, S., Ward, T., & Zhu, W.-J. (2002). Bleu: A method for automatic evaluation of machine translation. In Proceedings of the 40th annual meeting on association for computational linguistics (pp. 311–318). Association for Computational Linguistics.
PVS, A. & Karthik, G. (2007). Part-of-speech tagging and chunking using conditional random fields and transformation based learning. Shallow Parsing for South Asian Languages 21.
Ramanathan, A., Hegde, J., Shah, R. M., Bhattacharyya, P., & Sasikumar, M. (2008). Simple syntactic and morphological processing can help english-hindi statistical machine translation. In IJCNLP (pp. 513–520).
Sen, S., Banik, D., Ekbal, A., & Bhattacharyya, P. (2016). Iitp english-hindi machine translation system at wat 2016. In Proceedings of the 3rd Workshop on Asian Translation (WAT2016) (pp. 216–222).
Singh, A., Dey, N., Ashour, A. S., & Santhi, V. (2017). Web semantics for textual and visual information retrieval. IGI Global.
Wang, R., & Wang, G. (2019). Web text categorization based on statistical merging algorithm in big data environment. International Journal of Ambient Computing and Intelligence (IJACI), 10(3), 17–32.
Yamada, K., & Knight, K. (2001). A syntax-based statistical translation model. In Proceedings of the 39th Annual Meeting on Association for Computational Linguistics (pp. 523–530). Association for Computational Linguistics.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Banik, D. Phrase table re-adjustment for statistical machine translation. Int J Speech Technol 24, 903–911 (2021). https://doi.org/10.1007/s10772-020-09676-0
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10772-020-09676-0