Skip to main content

Bengali-to-English Forward and Backward Machine Transliteration Using Support Vector Machines

  • Conference paper
  • First Online:
Computational Intelligence, Communications, and Business Analytics (CICBA 2017)

Abstract

Name transliteration is an area which deals with transliteration of out-of-vocabulary (OOV) words. It plays an important role in developing automatic machine translation and cross lingual information retrieval system because these systems cannot directly translate out-of-vocabulary (OOV) words. In this article, we present SVM based name transliteration approach that considers transliteration task as a multi-class problem of pattern classification, where the input is a source transliteration unit (chunks of source grapheme) and the classes are the distinct transliteration units (chunks of target grapheme) in the target language. Our proposed approach deals with Bengali-to-English forward and backward name transliteration. Our proposed method has also been compared with some existing transliteration model that uses a modified version of Joint-Source channel model. After the systems have been evaluated, the obtained results show that our proposed SVM based model gives the best results among the others.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Institutional subscriptions

References

  1. Ekbal, A., Naskar, S., Bandyopadhyay, S.: A modified joint source channel model for transliteration. In: Proceedings of the COLING-ACL, Australia, pp. 191–198 (2006)

    Google Scholar 

  2. Abdul Jaleel, N., Larkey, L.: Statistical transliteration for English-Arabic cross language information retrieval. In: Proceedings of CIKM, pp. 139–146 (2003)

    Google Scholar 

  3. Virga, P., Khudanpur, S.: Transliteration of proper names in cross-language applications. In: 26th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 365–366 (2003)

    Google Scholar 

  4. Lee, J.S., Choi, K.S.: English to Korean statistical transliteration for information retrieval. J. Comput. Process. Orient. Lang. 12(1), 17–37 (1998)

    Google Scholar 

  5. Jeong, K.S., Myaeng, S.H., Lee, J.S., Choi, K.S.: Automatic identification and back-transliteration of foreign words for information retrieval. J. Inform. Process. Manage. 35(1), 523–540 (1999)

    Google Scholar 

  6. Kim, J.J., Lee, J.S., Choi, K.S.: Pronunciation unit based automatic English-Korean transliteration model using neural network. In: Proceedings of Korea Cognitive Science Association, pp. 247–252 (1999)

    Google Scholar 

  7. Lee, J.S.: An English-Korean transliteration and re-transliteration model for Cross lingual information retrieval. Ph.D. thesis, Computer Science Dept. KAIST (1999)

    Google Scholar 

  8. Kang, B.J., Choi, K.S.: Automatic transliteration and back-transliteration by decision tree learning. In: 2nd International Conference on Language Resources and Evaluation, pp. 1135–1411 (2000)

    Google Scholar 

  9. Kang, I.H., Kim, G.C.: English-to-Korean transliteration using multiple unbounded overlapping phoneme chunks. In: 18th International Conference on Computational Linguistics, pp. 418–424 (2000)

    Google Scholar 

  10. Kang, B.J.: A resolution of word mismatch problem caused by foreign word transliterations and English words in Korean information retrieval. Ph.D. thesis, Computer Science Dept., KAIST (2001)

    Google Scholar 

  11. Goto, I., Kato, N., Uratani, N., Ehara, T.: Transliteration considering context information based on the maximum entropy method. In: Proceedings of MT-Summit IX, pp. 125–132 (2003)

    Google Scholar 

  12. Li, H., Zhang, M., Su, J.: A joint source-channel model for MT. In: Proceedings of ACL, pp. 160–167 (2004)

    Google Scholar 

  13. Knight, K., Graehl, J.: MT. In: 35th Annual Meetings of the Association for Computational Linguistics, pp. 128–135 (1997)

    Google Scholar 

  14. Jung, S.Y., Hong, S., Paek, E.: An English to Korean transliteration model of extended Markov window. In: 18th Conference on Computational linguistics, pp. 383–389 (2000)

    Google Scholar 

  15. Meng, H., Lo, W.-K., Chen, B., Tang, K.: Generating phonetic cognates to handle named entities in English-Chinese cross-language spoken document retrieval. In: Proceedings of Automatic Speech Recognition and Understanding, ASRU 2001, pp. 311–314 (2001)

    Google Scholar 

  16. Al-Onaizan, Y., Knight, K.: Translating named entities using monolingual and bilingual resources. In: Proceedings of ACL, pp. 400–408 (2002)

    Google Scholar 

  17. Bilac, S., Tanaka, H.: Improving back-transliteration by combining information sources. In: Su, K.-Y., Tsujii, J., Lee, J.-H., Kwong, O.Y. (eds.) IJCNLP 2004. LNCS, vol. 3248, pp. 216–223. Springer, Heidelberg (2005). doi:10.1007/978-3-540-30211-7_23

    Chapter  Google Scholar 

  18. Stalls, B.G., Knight, K.: Translating names and technical terms in Arabic text. In: Proceedings of the Workshop on Computational Approaches to Semitic Languages, pp. 34–41. Association for Computational Linguistics, August 1998

    Google Scholar 

  19. Antony, P.J., Ajith, V.P., Soman, K.P.: Kernel method for English to Kannada transliteration. In: International Conference IEEE, Recent Trends in Information, Telecommunication and Computing (ITC), pp. 336–338 (2010)

    Google Scholar 

  20. Rathod, H., Dhore, M.L., Dhore, R.M.: Hindi and Marathi to English MT using SVM. Int. J. Natural Lang. Comput. (IJNLC) 2(4), 55–71 (2013)

    Article  Google Scholar 

  21. Dhore, M.L., Dixit, S.K., Sonwalkar, T.D.: Hindi to English MT of named entities using conditional random fields. Int. J. Comput. Appl. 48(23), 31–37 (2012)

    Google Scholar 

  22. Oh, J.H., Choi, K.S.: An English-Korean transliteration model using pronunciation and contextual rules. In: 19th International Conference on Computational linguistics, Association for Computational Linguistics, vol. 1, pp. 1–7 (2002)

    Google Scholar 

  23. Bhalla, D., Joshi, N., Mathur, I.: Rule based transliteration scheme for English to Punjabi. arXiv preprint arXiv:1307.4300 (2013)

  24. Deep, K., Goyal, V.: Development of a Punjabi to English transliteration system. Int. J. Comput. Sci. Commun. 2(2), 521–526 (2011)

    Google Scholar 

  25. Das, A., Saikh, T., Mondal, T., Ekbal, A., Bandyopadhyay, S.: English to Indian languages MT system at NEWS 2010. In: Proceedings of the 2010 Named Entities Workshop, Association for Computational Linguistics, pp. 71–75 (2010)

    Google Scholar 

  26. Haizhou, L., Min, Z., Jian, S.: A joint source-channel model for MT. In: ACL (2004)

    Google Scholar 

  27. Rama, T., Gali, K.: Modeling MT as a phrase based statistical machine translation problem. In: Proceedings of the Named Entities Workshop, Shared Task on Transliteration, pp. 124–127. Association for Computational Linguistics (2009)

    Google Scholar 

  28. Josan, G., Lehal, G.: A Punjabi to Hindi MT system. Int. J. Comput. Linguist. Chin. Lang. Process. 15(2), 77–102 (2010)

    Google Scholar 

  29. Josan, G., Kaur, J.: Punjabi to Hindi statistical MT. system. Int. J. Inform. Technol. Knowl. Manage. 4, 459–463 (2011)

    Google Scholar 

Download references

Acknowledgments

This research work has received support from the project entitled ‘‘Design and Development of a System for Querying, Clustering and Summarization for Bengali’’ funded by the Department of Science and Technology, Government of India under the SERB scheme.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Kamal Sarkar .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2017 Springer Nature Singapore Pte Ltd.

About this paper

Cite this paper

Sarkar, K., Chatterjee, S. (2017). Bengali-to-English Forward and Backward Machine Transliteration Using Support Vector Machines. In: Mandal, J., Dutta, P., Mukhopadhyay, S. (eds) Computational Intelligence, Communications, and Business Analytics. CICBA 2017. Communications in Computer and Information Science, vol 776. Springer, Singapore. https://doi.org/10.1007/978-981-10-6430-2_43

Download citation

  • DOI: https://doi.org/10.1007/978-981-10-6430-2_43

  • Published:

  • Publisher Name: Springer, Singapore

  • Print ISBN: 978-981-10-6429-6

  • Online ISBN: 978-981-10-6430-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics