Bengali-to-English Forward and Backward Machine Transliteration Using Support Vector Machines

Sarkar, Kamal; Chatterjee, Soma

doi:10.1007/978-981-10-6430-2_43

Kamal Sarkar¹² &
Soma Chatterjee¹²

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 776))

Included in the following conference series:

International Conference on Computational Intelligence, Communications, and Business Analytics

1382 Accesses
6 Citations

Abstract

Name transliteration is an area which deals with transliteration of out-of-vocabulary (OOV) words. It plays an important role in developing automatic machine translation and cross lingual information retrieval system because these systems cannot directly translate out-of-vocabulary (OOV) words. In this article, we present SVM based name transliteration approach that considers transliteration task as a multi-class problem of pattern classification, where the input is a source transliteration unit (chunks of source grapheme) and the classes are the distinct transliteration units (chunks of target grapheme) in the target language. Our proposed approach deals with Bengali-to-English forward and backward name transliteration. Our proposed method has also been compared with some existing transliteration model that uses a modified version of Joint-Source channel model. After the systems have been evaluated, the obtained results show that our proposed SVM based model gives the best results among the others.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Institutional subscriptions

References

Ekbal, A., Naskar, S., Bandyopadhyay, S.: A modified joint source channel model for transliteration. In: Proceedings of the COLING-ACL, Australia, pp. 191–198 (2006)
Google Scholar
Abdul Jaleel, N., Larkey, L.: Statistical transliteration for English-Arabic cross language information retrieval. In: Proceedings of CIKM, pp. 139–146 (2003)
Google Scholar
Virga, P., Khudanpur, S.: Transliteration of proper names in cross-language applications. In: 26th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 365–366 (2003)
Google Scholar
Lee, J.S., Choi, K.S.: English to Korean statistical transliteration for information retrieval. J. Comput. Process. Orient. Lang. 12(1), 17–37 (1998)
Google Scholar
Jeong, K.S., Myaeng, S.H., Lee, J.S., Choi, K.S.: Automatic identification and back-transliteration of foreign words for information retrieval. J. Inform. Process. Manage. 35(1), 523–540 (1999)
Google Scholar
Kim, J.J., Lee, J.S., Choi, K.S.: Pronunciation unit based automatic English-Korean transliteration model using neural network. In: Proceedings of Korea Cognitive Science Association, pp. 247–252 (1999)
Google Scholar
Lee, J.S.: An English-Korean transliteration and re-transliteration model for Cross lingual information retrieval. Ph.D. thesis, Computer Science Dept. KAIST (1999)
Google Scholar
Kang, B.J., Choi, K.S.: Automatic transliteration and back-transliteration by decision tree learning. In: 2nd International Conference on Language Resources and Evaluation, pp. 1135–1411 (2000)
Google Scholar
Kang, I.H., Kim, G.C.: English-to-Korean transliteration using multiple unbounded overlapping phoneme chunks. In: 18th International Conference on Computational Linguistics, pp. 418–424 (2000)
Google Scholar
Kang, B.J.: A resolution of word mismatch problem caused by foreign word transliterations and English words in Korean information retrieval. Ph.D. thesis, Computer Science Dept., KAIST (2001)
Google Scholar
Goto, I., Kato, N., Uratani, N., Ehara, T.: Transliteration considering context information based on the maximum entropy method. In: Proceedings of MT-Summit IX, pp. 125–132 (2003)
Google Scholar
Li, H., Zhang, M., Su, J.: A joint source-channel model for MT. In: Proceedings of ACL, pp. 160–167 (2004)
Google Scholar
Knight, K., Graehl, J.: MT. In: 35th Annual Meetings of the Association for Computational Linguistics, pp. 128–135 (1997)
Google Scholar
Jung, S.Y., Hong, S., Paek, E.: An English to Korean transliteration model of extended Markov window. In: 18th Conference on Computational linguistics, pp. 383–389 (2000)
Google Scholar
Meng, H., Lo, W.-K., Chen, B., Tang, K.: Generating phonetic cognates to handle named entities in English-Chinese cross-language spoken document retrieval. In: Proceedings of Automatic Speech Recognition and Understanding, ASRU 2001, pp. 311–314 (2001)
Google Scholar
Al-Onaizan, Y., Knight, K.: Translating named entities using monolingual and bilingual resources. In: Proceedings of ACL, pp. 400–408 (2002)
Google Scholar
Bilac, S., Tanaka, H.: Improving back-transliteration by combining information sources. In: Su, K.-Y., Tsujii, J., Lee, J.-H., Kwong, O.Y. (eds.) IJCNLP 2004. LNCS, vol. 3248, pp. 216–223. Springer, Heidelberg (2005). doi:10.1007/978-3-540-30211-7_23
Chapter Google Scholar
Stalls, B.G., Knight, K.: Translating names and technical terms in Arabic text. In: Proceedings of the Workshop on Computational Approaches to Semitic Languages, pp. 34–41. Association for Computational Linguistics, August 1998
Google Scholar
Antony, P.J., Ajith, V.P., Soman, K.P.: Kernel method for English to Kannada transliteration. In: International Conference IEEE, Recent Trends in Information, Telecommunication and Computing (ITC), pp. 336–338 (2010)
Google Scholar
Rathod, H., Dhore, M.L., Dhore, R.M.: Hindi and Marathi to English MT using SVM. Int. J. Natural Lang. Comput. (IJNLC) 2(4), 55–71 (2013)
Article Google Scholar
Dhore, M.L., Dixit, S.K., Sonwalkar, T.D.: Hindi to English MT of named entities using conditional random fields. Int. J. Comput. Appl. 48(23), 31–37 (2012)
Google Scholar
Oh, J.H., Choi, K.S.: An English-Korean transliteration model using pronunciation and contextual rules. In: 19th International Conference on Computational linguistics, Association for Computational Linguistics, vol. 1, pp. 1–7 (2002)
Google Scholar
Bhalla, D., Joshi, N., Mathur, I.: Rule based transliteration scheme for English to Punjabi. arXiv preprint arXiv:1307.4300 (2013)
Deep, K., Goyal, V.: Development of a Punjabi to English transliteration system. Int. J. Comput. Sci. Commun. 2(2), 521–526 (2011)
Google Scholar
Das, A., Saikh, T., Mondal, T., Ekbal, A., Bandyopadhyay, S.: English to Indian languages MT system at NEWS 2010. In: Proceedings of the 2010 Named Entities Workshop, Association for Computational Linguistics, pp. 71–75 (2010)
Google Scholar
Haizhou, L., Min, Z., Jian, S.: A joint source-channel model for MT. In: ACL (2004)
Google Scholar
Rama, T., Gali, K.: Modeling MT as a phrase based statistical machine translation problem. In: Proceedings of the Named Entities Workshop, Shared Task on Transliteration, pp. 124–127. Association for Computational Linguistics (2009)
Google Scholar
Josan, G., Lehal, G.: A Punjabi to Hindi MT system. Int. J. Comput. Linguist. Chin. Lang. Process. 15(2), 77–102 (2010)
Google Scholar
Josan, G., Kaur, J.: Punjabi to Hindi statistical MT. system. Int. J. Inform. Technol. Knowl. Manage. 4, 459–463 (2011)
Google Scholar

Download references

Acknowledgments

This research work has received support from the project entitled ‘‘Design and Development of a System for Querying, Clustering and Summarization for Bengali’’ funded by the Department of Science and Technology, Government of India under the SERB scheme.

Author information

Authors and Affiliations

Department of Computer Science and Engineering, Jadavpur University, Kolkata, India
Kamal Sarkar & Soma Chatterjee

Authors

Kamal Sarkar
View author publications
You can also search for this author in PubMed Google Scholar
Soma Chatterjee
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Kamal Sarkar .

Editor information

Editors and Affiliations

Department of Computer Science and Engineering, University of Kalyani, Kalyani, West Bengal, India
J. K. Mandal
Department of Computer and System Sciences, Visva Bharati University, Bolpur Santiniketan, West Bengal, India
Paramartha Dutta
Department of Information Technology, Calcutta Business School, Kolkata, India
Somnath Mukhopadhyay

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Sarkar, K., Chatterjee, S. (2017). Bengali-to-English Forward and Backward Machine Transliteration Using Support Vector Machines. In: Mandal, J., Dutta, P., Mukhopadhyay, S. (eds) Computational Intelligence, Communications, and Business Analytics. CICBA 2017. Communications in Computer and Information Science, vol 776. Springer, Singapore. https://doi.org/10.1007/978-981-10-6430-2_43

Download citation

DOI: https://doi.org/10.1007/978-981-10-6430-2_43
Published: 26 September 2017
Publisher Name: Springer, Singapore
Print ISBN: 978-981-10-6429-6
Online ISBN: 978-981-10-6430-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics