Skip to main content
Log in

RNN based machine translation and transliteration for Twitter data

  • Published:
International Journal of Speech Technology Aims and scope Submit manuscript

Abstract

The present work aims at analyzing the social media data for code-switching and transliterated to English language using the special kind of recurrent neural network (RNN) called Long Short-Term Memory (LSTM) Network. During the course of work, TensorFlow is used to express LSTM suitably. Twitter data is stored in MongoDB to enable easy handling and processing of data. The data is parsed through different fields with the aid of Python script and cleaned using regular expressions. The LSTM model is trained for 1 M data which is further used for transliteration and translation of the Twitter data. Translation and transliteration of social media data enables publicizing the content in the language understood by majority of the population. With this, any content which is anti-social or threat to law and order can be easily verified and blocked at the source.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

Similar content being viewed by others

References

  • Al-muzaini, H. A., Al-yahya, T. N., & Benhidour, H. (2018). Automatic Arabic image captioning using RNN-LSTM-based language model and CNN. International Journal of Advanced Computer Science and Applications, 9, 6.

    Article  Google Scholar 

  • Ananthakrishnan, R., Bhattacharyya, P., Sasikumar, M., & Shah, R. M. (2006). Some issues in automatic evaluation of English-Hindi MT: more blues for BLEU. Proceedings of COLING/ACL 2006.

  • Beck, C. A. J., & Sales, B. D. (2001). Family mediation: Facts, myths, and future prospects (pp. 100–102). Washington, DC: American Psychological Association. https://doi.org/10.1037/10401-000.

  • Chelba, C., Mikolov, T., Schuster, M., Ge, Q., Brants, T., Koehn, P., & Robinson, T. (2013). One billion word benchmark for measuring progress in statistical language modeling. Retrieved from https://arxiv.org/abs/1312.3005.

  • Durrani, N., Sajjad, H., Fraser, A., & Schmid, H. (2010). Hindi-to-Urdu machine translation through transliteration. Stuttgart: Institute for Natural Language Processing, University of Stuttgart.

    Google Scholar 

  • Gers, F. A., Schmidhuber, J., & Cummins, F. (2000). Learning to forget: Continual predictionwith LSTM. Neural Computation, 12(10), 2451–2471.

    Article  Google Scholar 

  • Gillick, D., Brunk, C., Vinyals, O., & Subramanya, A. (2016). Multilingual language processing from bytes. Google Research. Retrieved from https://arxiv.org/abs/1512.00103.

  • Hochreiter, S., & Schmidhuber, J. (1997). Long short-term memory. Neural Computation, 9(8), 1735–1780.

    Article  Google Scholar 

  • Hovy, E. H. (1999). Toward finely differentiated evaluation metrics for machine translation. In Proceedings of the Eagles Workshop on Standards and Evaluation, Pisa, Italy.

  • Islam, M. S., Mousumi, S. S. S., Abujar, S., & Hossain, S. A. (2019). Sequence-to-sequence Bangla sentence generation with LSTM recurrent neural networks. International Conference on Pervasive Computing Advances and Applications-PerCAA, 152, 51–58. https://doi.org/10.1016/j.procs.2019.05.026.

    Article  Google Scholar 

  • Ji, S., Vishwanathan, S. V. N., Satish, N., Anderson, M. J., & Dubey, P. (2015a) Blackout: Speeding up recurrent neural network language models Exploring the Limits of Language Modeling with very large vocabularies. CoRR, abs/1511.06909. Retrieved from https://arxiv.org/abs/1511.06909.

  • Ji, Y., Cohn, T., Kong, L., Dyer, C., & Eisenstein, J. (2015b). Document context language models. Retrieved from https://arxiv.org/abs/1511.03962.

  • Mikolov, T., Karafiat, L., Burget, L., Cer-nocky, J., & Khudanpur, S. (2010). Recurrent neural network based language model. In INTERSPEECH, vol. 2, p. 3.

  • Mikolov, T. & Zweig, G. (2012). Context dependent recurrent neural network language model. In SLT, pp. 234–239.

  • Papineni, K., Roukos, S., Ward, T., & Zhu, W. J. (2002). BLEU: a method for automatic evaluation of machine translation. In ACL.

  • Phan-Vu, H.-H., Tran, V.-T., Nguyen, V.-N., Dang, H.-V., & Do, P.-T. (2019) Neural machine translation between Vietnamese and English: An empirical study. Journal of Computer Science and Cybernetics. Retrieved from https://arxiv.org/pdf/1810.12557.pdf.

  • Sen, S., Banik, D., Ekbal, A., & Bhattacharyya, P. (2016). IITP English-Hindi machine translation system at WAT 2016. Proceedings of the 3rd Workshop on Asian Translation (pp. 216–222), Osaka, Japan, December 11–17 2016.

  • Song, S., Huang, H., & Ruan, T. (2019). Abstractive text summarization using LSTM-CNN based deep learning. Multimedia Tools Application, 78, 857–875. https://doi.org/10.1007/s11042-018-5749-3.

    Article  Google Scholar 

  • Sutskever, I., Vinyals, O., & Le, Q. V. (2014). Sequence to sequence learning with neural networks. Retrieved from https://arxiv.org/abs/1409.3215.

  • Wang, T. & Cho, K. (2015). Larger-context language modeling. Retrieved from https://arxiv.org/abs/1511.03729.

  • White, J. S. & O’Connell, T. (1994). The ARPA MT evaluation methodologies: Evolution, lessons, and future approaches. In Proceedings of the First Conference of the Association for Machine Translation in the Americas (pp. 193–205), Columbia, Maryland.

  • Williams, W., Prasad, N., Mrva, D., Ash, T., & Robinson, T. (2015). Scaling recurrent neural network language models. In: 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 5391–5395. IEEE.

  • Zaremba, W., Sutskever, I., & Vinyals, O. (2014). Recurrent neural network regularization. Retrieved from https://arxiv.org/abs/1409.2329.

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to M. K. Vathsala.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Vathsala, M.K., Holi, G. RNN based machine translation and transliteration for Twitter data. Int J Speech Technol 23, 499–504 (2020). https://doi.org/10.1007/s10772-020-09724-9

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10772-020-09724-9

Keywords

Navigation