Abstract
Name transliteration is an area which deals with transliteration of out-of-vocabulary (OOV) words. It plays an important role in developing automatic machine translation and cross lingual information retrieval system because these systems cannot directly translate out-of-vocabulary (OOV) words. In this article, we present SVM based name transliteration approach that considers transliteration task as a multi-class problem of pattern classification, where the input is a source transliteration unit (chunks of source grapheme) and the classes are the distinct transliteration units (chunks of target grapheme) in the target language. Our proposed approach deals with Bengali-to-English forward and backward name transliteration. Our proposed method has also been compared with some existing transliteration model that uses a modified version of Joint-Source channel model. After the systems have been evaluated, the obtained results show that our proposed SVM based model gives the best results among the others.
References
Ekbal, A., Naskar, S., Bandyopadhyay, S.: A modified joint source channel model for transliteration. In: Proceedings of the COLING-ACL, Australia, pp. 191–198 (2006)
Abdul Jaleel, N., Larkey, L.: Statistical transliteration for English-Arabic cross language information retrieval. In: Proceedings of CIKM, pp. 139–146 (2003)
Virga, P., Khudanpur, S.: Transliteration of proper names in cross-language applications. In: 26th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 365–366 (2003)
Lee, J.S., Choi, K.S.: English to Korean statistical transliteration for information retrieval. J. Comput. Process. Orient. Lang. 12(1), 17–37 (1998)
Jeong, K.S., Myaeng, S.H., Lee, J.S., Choi, K.S.: Automatic identification and back-transliteration of foreign words for information retrieval. J. Inform. Process. Manage. 35(1), 523–540 (1999)
Kim, J.J., Lee, J.S., Choi, K.S.: Pronunciation unit based automatic English-Korean transliteration model using neural network. In: Proceedings of Korea Cognitive Science Association, pp. 247–252 (1999)
Lee, J.S.: An English-Korean transliteration and re-transliteration model for Cross lingual information retrieval. Ph.D. thesis, Computer Science Dept. KAIST (1999)
Kang, B.J., Choi, K.S.: Automatic transliteration and back-transliteration by decision tree learning. In: 2nd International Conference on Language Resources and Evaluation, pp. 1135–1411 (2000)
Kang, I.H., Kim, G.C.: English-to-Korean transliteration using multiple unbounded overlapping phoneme chunks. In: 18th International Conference on Computational Linguistics, pp. 418–424 (2000)
Kang, B.J.: A resolution of word mismatch problem caused by foreign word transliterations and English words in Korean information retrieval. Ph.D. thesis, Computer Science Dept., KAIST (2001)
Goto, I., Kato, N., Uratani, N., Ehara, T.: Transliteration considering context information based on the maximum entropy method. In: Proceedings of MT-Summit IX, pp. 125–132 (2003)
Li, H., Zhang, M., Su, J.: A joint source-channel model for MT. In: Proceedings of ACL, pp. 160–167 (2004)
Knight, K., Graehl, J.: MT. In: 35th Annual Meetings of the Association for Computational Linguistics, pp. 128–135 (1997)
Jung, S.Y., Hong, S., Paek, E.: An English to Korean transliteration model of extended Markov window. In: 18th Conference on Computational linguistics, pp. 383–389 (2000)
Meng, H., Lo, W.-K., Chen, B., Tang, K.: Generating phonetic cognates to handle named entities in English-Chinese cross-language spoken document retrieval. In: Proceedings of Automatic Speech Recognition and Understanding, ASRU 2001, pp. 311–314 (2001)
Al-Onaizan, Y., Knight, K.: Translating named entities using monolingual and bilingual resources. In: Proceedings of ACL, pp. 400–408 (2002)
Bilac, S., Tanaka, H.: Improving back-transliteration by combining information sources. In: Su, K.-Y., Tsujii, J., Lee, J.-H., Kwong, O.Y. (eds.) IJCNLP 2004. LNCS, vol. 3248, pp. 216–223. Springer, Heidelberg (2005). doi:10.1007/978-3-540-30211-7_23
Stalls, B.G., Knight, K.: Translating names and technical terms in Arabic text. In: Proceedings of the Workshop on Computational Approaches to Semitic Languages, pp. 34–41. Association for Computational Linguistics, August 1998
Antony, P.J., Ajith, V.P., Soman, K.P.: Kernel method for English to Kannada transliteration. In: International Conference IEEE, Recent Trends in Information, Telecommunication and Computing (ITC), pp. 336–338 (2010)
Rathod, H., Dhore, M.L., Dhore, R.M.: Hindi and Marathi to English MT using SVM. Int. J. Natural Lang. Comput. (IJNLC) 2(4), 55–71 (2013)
Dhore, M.L., Dixit, S.K., Sonwalkar, T.D.: Hindi to English MT of named entities using conditional random fields. Int. J. Comput. Appl. 48(23), 31–37 (2012)
Oh, J.H., Choi, K.S.: An English-Korean transliteration model using pronunciation and contextual rules. In: 19th International Conference on Computational linguistics, Association for Computational Linguistics, vol. 1, pp. 1–7 (2002)
Bhalla, D., Joshi, N., Mathur, I.: Rule based transliteration scheme for English to Punjabi. arXiv preprint arXiv:1307.4300 (2013)
Deep, K., Goyal, V.: Development of a Punjabi to English transliteration system. Int. J. Comput. Sci. Commun. 2(2), 521–526 (2011)
Das, A., Saikh, T., Mondal, T., Ekbal, A., Bandyopadhyay, S.: English to Indian languages MT system at NEWS 2010. In: Proceedings of the 2010 Named Entities Workshop, Association for Computational Linguistics, pp. 71–75 (2010)
Haizhou, L., Min, Z., Jian, S.: A joint source-channel model for MT. In: ACL (2004)
Rama, T., Gali, K.: Modeling MT as a phrase based statistical machine translation problem. In: Proceedings of the Named Entities Workshop, Shared Task on Transliteration, pp. 124–127. Association for Computational Linguistics (2009)
Josan, G., Lehal, G.: A Punjabi to Hindi MT system. Int. J. Comput. Linguist. Chin. Lang. Process. 15(2), 77–102 (2010)
Josan, G., Kaur, J.: Punjabi to Hindi statistical MT. system. Int. J. Inform. Technol. Knowl. Manage. 4, 459–463 (2011)
Acknowledgments
This research work has received support from the project entitled ‘‘Design and Development of a System for Querying, Clustering and Summarization for Bengali’’ funded by the Department of Science and Technology, Government of India under the SERB scheme.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2017 Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Sarkar, K., Chatterjee, S. (2017). Bengali-to-English Forward and Backward Machine Transliteration Using Support Vector Machines. In: Mandal, J., Dutta, P., Mukhopadhyay, S. (eds) Computational Intelligence, Communications, and Business Analytics. CICBA 2017. Communications in Computer and Information Science, vol 776. Springer, Singapore. https://doi.org/10.1007/978-981-10-6430-2_43
Download citation
DOI: https://doi.org/10.1007/978-981-10-6430-2_43
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-10-6429-6
Online ISBN: 978-981-10-6430-2
eBook Packages: Computer ScienceComputer Science (R0)