RNN based machine translation and transliteration for Twitter data

Vathsala, M. K.; Holi, Ganga

doi:10.1007/s10772-020-09724-9

RNN based machine translation and transliteration for Twitter data

Published: 12 June 2020

Volume 23, pages 499–504, (2020)
Cite this article

International Journal of Speech Technology Aims and scope Submit manuscript

695 Accesses
30 Citations
Explore all metrics

Abstract

The present work aims at analyzing the social media data for code-switching and transliterated to English language using the special kind of recurrent neural network (RNN) called Long Short-Term Memory (LSTM) Network. During the course of work, TensorFlow is used to express LSTM suitably. Twitter data is stored in MongoDB to enable easy handling and processing of data. The data is parsed through different fields with the aid of Python script and cleaned using regular expressions. The LSTM model is trained for 1 M data which is further used for transliteration and translation of the Twitter data. Translation and transliteration of social media data enables publicizing the content in the language understood by majority of the population. With this, any content which is anti-social or threat to law and order can be easily verified and blocked at the source.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Natural language processing: state of the art, current trends and challenges

Article 14 July 2022

Foundation and large language models: fundamentals, challenges, opportunities, and social impacts

Article 27 November 2023

Transformer models for text-based emotion detection: a review of BERT-based approaches

Article 08 February 2021

References

Al-muzaini, H. A., Al-yahya, T. N., & Benhidour, H. (2018). Automatic Arabic image captioning using RNN-LSTM-based language model and CNN. International Journal of Advanced Computer Science and Applications, 9, 6.
Article Google Scholar
Ananthakrishnan, R., Bhattacharyya, P., Sasikumar, M., & Shah, R. M. (2006). Some issues in automatic evaluation of English-Hindi MT: more blues for BLEU. Proceedings of COLING/ACL 2006.
Beck, C. A. J., & Sales, B. D. (2001). Family mediation: Facts, myths, and future prospects (pp. 100–102). Washington, DC: American Psychological Association. https://doi.org/10.1037/10401-000.
Chelba, C., Mikolov, T., Schuster, M., Ge, Q., Brants, T., Koehn, P., & Robinson, T. (2013). One billion word benchmark for measuring progress in statistical language modeling. Retrieved from https://arxiv.org/abs/1312.3005.
Durrani, N., Sajjad, H., Fraser, A., & Schmid, H. (2010). Hindi-to-Urdu machine translation through transliteration. Stuttgart: Institute for Natural Language Processing, University of Stuttgart.
Google Scholar
Gers, F. A., Schmidhuber, J., & Cummins, F. (2000). Learning to forget: Continual predictionwith LSTM. Neural Computation, 12(10), 2451–2471.
Article Google Scholar
Gillick, D., Brunk, C., Vinyals, O., & Subramanya, A. (2016). Multilingual language processing from bytes. Google Research. Retrieved from https://arxiv.org/abs/1512.00103.
Hochreiter, S., & Schmidhuber, J. (1997). Long short-term memory. Neural Computation, 9(8), 1735–1780.
Article Google Scholar
Hovy, E. H. (1999). Toward finely differentiated evaluation metrics for machine translation. In Proceedings of the Eagles Workshop on Standards and Evaluation, Pisa, Italy.
Islam, M. S., Mousumi, S. S. S., Abujar, S., & Hossain, S. A. (2019). Sequence-to-sequence Bangla sentence generation with LSTM recurrent neural networks. International Conference on Pervasive Computing Advances and Applications-PerCAA, 152, 51–58. https://doi.org/10.1016/j.procs.2019.05.026.
Article Google Scholar
Ji, S., Vishwanathan, S. V. N., Satish, N., Anderson, M. J., & Dubey, P. (2015a) Blackout: Speeding up recurrent neural network language models Exploring the Limits of Language Modeling with very large vocabularies. CoRR, abs/1511.06909. Retrieved from https://arxiv.org/abs/1511.06909.
Ji, Y., Cohn, T., Kong, L., Dyer, C., & Eisenstein, J. (2015b). Document context language models. Retrieved from https://arxiv.org/abs/1511.03962.
Mikolov, T., Karafiat, L., Burget, L., Cer-nocky, J., & Khudanpur, S. (2010). Recurrent neural network based language model. In INTERSPEECH, vol. 2, p. 3.
Mikolov, T. & Zweig, G. (2012). Context dependent recurrent neural network language model. In SLT, pp. 234–239.
Papineni, K., Roukos, S., Ward, T., & Zhu, W. J. (2002). BLEU: a method for automatic evaluation of machine translation. In ACL.
Phan-Vu, H.-H., Tran, V.-T., Nguyen, V.-N., Dang, H.-V., & Do, P.-T. (2019) Neural machine translation between Vietnamese and English: An empirical study. Journal of Computer Science and Cybernetics. Retrieved from https://arxiv.org/pdf/1810.12557.pdf.
Sen, S., Banik, D., Ekbal, A., & Bhattacharyya, P. (2016). IITP English-Hindi machine translation system at WAT 2016. Proceedings of the 3rd Workshop on Asian Translation (pp. 216–222), Osaka, Japan, December 11–17 2016.
Song, S., Huang, H., & Ruan, T. (2019). Abstractive text summarization using LSTM-CNN based deep learning. Multimedia Tools Application, 78, 857–875. https://doi.org/10.1007/s11042-018-5749-3.
Article Google Scholar
Sutskever, I., Vinyals, O., & Le, Q. V. (2014). Sequence to sequence learning with neural networks. Retrieved from https://arxiv.org/abs/1409.3215.
Wang, T. & Cho, K. (2015). Larger-context language modeling. Retrieved from https://arxiv.org/abs/1511.03729.
White, J. S. & O’Connell, T. (1994). The ARPA MT evaluation methodologies: Evolution, lessons, and future approaches. In Proceedings of the First Conference of the Association for Machine Translation in the Americas (pp. 193–205), Columbia, Maryland.
Williams, W., Prasad, N., Mrva, D., Ash, T., & Robinson, T. (2015). Scaling recurrent neural network language models. In: 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 5391–5395. IEEE.
Zaremba, W., Sutskever, I., & Vinyals, O. (2014). Recurrent neural network regularization. Retrieved from https://arxiv.org/abs/1409.2329.

Download references

Author information

Authors and Affiliations

Dept of ISE, MSRIT, Bengaluru, VTU, Belagavi, Karnataka, 560054, India
M. K. Vathsala
Dept of ISE, Global Academy of Technology, Bengaluru, VTU, Belagavi, 560098, India
Holi Ganga

Authors

M. K. Vathsala
View author publications
You can also search for this author in PubMed Google Scholar
Holi Ganga
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to M. K. Vathsala.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Vathsala, M.K., Holi, G. RNN based machine translation and transliteration for Twitter data. Int J Speech Technol 23, 499–504 (2020). https://doi.org/10.1007/s10772-020-09724-9

Download citation

Received: 13 February 2020
Accepted: 02 June 2020
Published: 12 June 2020
Issue Date: September 2020
DOI: https://doi.org/10.1007/s10772-020-09724-9

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

RNN based machine translation and transliteration for Twitter data

Abstract

Access this article

Similar content being viewed by others

Natural language processing: state of the art, current trends and challenges

Foundation and large language models: fundamentals, challenges, opportunities, and social impacts

Transformer models for text-based emotion detection: a review of BERT-based approaches

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

RNN based machine translation and transliteration for Twitter data

Abstract

Access this article

Similar content being viewed by others

Natural language processing: state of the art, current trends and challenges

Foundation and large language models: fundamentals, challenges, opportunities, and social impacts

Transformer models for text-based emotion detection: a review of BERT-based approaches

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation