Abstract
The present work aims at analyzing the social media data for code-switching and transliterated to English language using the special kind of recurrent neural network (RNN) called Long Short-Term Memory (LSTM) Network. During the course of work, TensorFlow is used to express LSTM suitably. Twitter data is stored in MongoDB to enable easy handling and processing of data. The data is parsed through different fields with the aid of Python script and cleaned using regular expressions. The LSTM model is trained for 1 M data which is further used for transliteration and translation of the Twitter data. Translation and transliteration of social media data enables publicizing the content in the language understood by majority of the population. With this, any content which is anti-social or threat to law and order can be easily verified and blocked at the source.
Similar content being viewed by others
References
Al-muzaini, H. A., Al-yahya, T. N., & Benhidour, H. (2018). Automatic Arabic image captioning using RNN-LSTM-based language model and CNN. International Journal of Advanced Computer Science and Applications, 9, 6.
Ananthakrishnan, R., Bhattacharyya, P., Sasikumar, M., & Shah, R. M. (2006). Some issues in automatic evaluation of English-Hindi MT: more blues for BLEU. Proceedings of COLING/ACL 2006.
Beck, C. A. J., & Sales, B. D. (2001). Family mediation: Facts, myths, and future prospects (pp. 100–102). Washington, DC: American Psychological Association. https://doi.org/10.1037/10401-000.
Chelba, C., Mikolov, T., Schuster, M., Ge, Q., Brants, T., Koehn, P., & Robinson, T. (2013). One billion word benchmark for measuring progress in statistical language modeling. Retrieved from https://arxiv.org/abs/1312.3005.
Durrani, N., Sajjad, H., Fraser, A., & Schmid, H. (2010). Hindi-to-Urdu machine translation through transliteration. Stuttgart: Institute for Natural Language Processing, University of Stuttgart.
Gers, F. A., Schmidhuber, J., & Cummins, F. (2000). Learning to forget: Continual predictionwith LSTM. Neural Computation, 12(10), 2451–2471.
Gillick, D., Brunk, C., Vinyals, O., & Subramanya, A. (2016). Multilingual language processing from bytes. Google Research. Retrieved from https://arxiv.org/abs/1512.00103.
Hochreiter, S., & Schmidhuber, J. (1997). Long short-term memory. Neural Computation, 9(8), 1735–1780.
Hovy, E. H. (1999). Toward finely differentiated evaluation metrics for machine translation. In Proceedings of the Eagles Workshop on Standards and Evaluation, Pisa, Italy.
Islam, M. S., Mousumi, S. S. S., Abujar, S., & Hossain, S. A. (2019). Sequence-to-sequence Bangla sentence generation with LSTM recurrent neural networks. International Conference on Pervasive Computing Advances and Applications-PerCAA, 152, 51–58. https://doi.org/10.1016/j.procs.2019.05.026.
Ji, S., Vishwanathan, S. V. N., Satish, N., Anderson, M. J., & Dubey, P. (2015a) Blackout: Speeding up recurrent neural network language models Exploring the Limits of Language Modeling with very large vocabularies. CoRR, abs/1511.06909. Retrieved from https://arxiv.org/abs/1511.06909.
Ji, Y., Cohn, T., Kong, L., Dyer, C., & Eisenstein, J. (2015b). Document context language models. Retrieved from https://arxiv.org/abs/1511.03962.
Mikolov, T., Karafiat, L., Burget, L., Cer-nocky, J., & Khudanpur, S. (2010). Recurrent neural network based language model. In INTERSPEECH, vol. 2, p. 3.
Mikolov, T. & Zweig, G. (2012). Context dependent recurrent neural network language model. In SLT, pp. 234–239.
Papineni, K., Roukos, S., Ward, T., & Zhu, W. J. (2002). BLEU: a method for automatic evaluation of machine translation. In ACL.
Phan-Vu, H.-H., Tran, V.-T., Nguyen, V.-N., Dang, H.-V., & Do, P.-T. (2019) Neural machine translation between Vietnamese and English: An empirical study. Journal of Computer Science and Cybernetics. Retrieved from https://arxiv.org/pdf/1810.12557.pdf.
Sen, S., Banik, D., Ekbal, A., & Bhattacharyya, P. (2016). IITP English-Hindi machine translation system at WAT 2016. Proceedings of the 3rd Workshop on Asian Translation (pp. 216–222), Osaka, Japan, December 11–17 2016.
Song, S., Huang, H., & Ruan, T. (2019). Abstractive text summarization using LSTM-CNN based deep learning. Multimedia Tools Application, 78, 857–875. https://doi.org/10.1007/s11042-018-5749-3.
Sutskever, I., Vinyals, O., & Le, Q. V. (2014). Sequence to sequence learning with neural networks. Retrieved from https://arxiv.org/abs/1409.3215.
Wang, T. & Cho, K. (2015). Larger-context language modeling. Retrieved from https://arxiv.org/abs/1511.03729.
White, J. S. & O’Connell, T. (1994). The ARPA MT evaluation methodologies: Evolution, lessons, and future approaches. In Proceedings of the First Conference of the Association for Machine Translation in the Americas (pp. 193–205), Columbia, Maryland.
Williams, W., Prasad, N., Mrva, D., Ash, T., & Robinson, T. (2015). Scaling recurrent neural network language models. In: 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 5391–5395. IEEE.
Zaremba, W., Sutskever, I., & Vinyals, O. (2014). Recurrent neural network regularization. Retrieved from https://arxiv.org/abs/1409.2329.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Vathsala, M.K., Holi, G. RNN based machine translation and transliteration for Twitter data. Int J Speech Technol 23, 499–504 (2020). https://doi.org/10.1007/s10772-020-09724-9
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10772-020-09724-9