Skip to main content
Log in

Telugu named entity recognition using bert

  • Regular Paper
  • Published:
International Journal of Data Science and Analytics Aims and scope Submit manuscript

Abstract

Named entity recognition (NER) is a fundamental step for many Natural Language Processing tasks that aim to classify words into a predefined set of named entities (NE). For high-resource languages like English, many deep learning architectures have produced good results. However, the NER task has not yet achieved much progress for Telugu, a low resource Language. This paper performs the NER task on Telugu Language using Word2Vec, Glove, FastText, Contextual String embedding, and bidirectional encoder representations from transformers (BERT) embeddings generated using Telugu Wikipedia articles. These embeddings have been used as input to build deep learning models. We also investigated the effect of concatenating handcrafted features with the word embeddings on the deep learning model’s performance. Our experimental results demonstrate that embeddings generated from BERT added with handcrafted features have outperformed other word embedding models with an F1-Score 96.32%.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8

Similar content being viewed by others

Notes

  1. https://dumps.wikimedia.org/tewiki/.

  2. https://meta.wikimedia.org/wiki/List_of_Wikipedias.

  3. http://fire.irsi.res.in/fire/2018/home.

References

  1. Grishman, R., Sundheim, B.: Message understanding conference- 6: a brief history. In: COLING 1996, The 16th International Conference on Computational Linguistics, vol. 1 (1996)

  2. Chiu, J.P.C., Nichols, E.: Named entity recognition with bidirectional LSTM-CNNs. Trans. Assoc. Comput. Linguist. 4, 357–370 (2016)

    Article  Google Scholar 

  3. Ma, X., Hovy, E.: End-to-end sequence labeling via bi-directional LSTM-CNNs-CRF. In: Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics, vol.1 (Long Papers), Berlin, pp. 1064–1074 (2016)

  4. Bojanowski, P., Grave, E., Joulin, A., Mikolov, T.: Enriching word vectors with subword information. Trans. Assoc. Comput. Linguist. 5, 135–146 (2017)

    Article  Google Scholar 

  5. Akbik, A., Bergmann, T., Blythe, D., Rasul, K., Schweter, S., Vollgraf, R.: Contextual string embeddings for sequence labeling. In: Proceedings of the 27th International Conference on Computational Linguistics, Santa Fe, New Mexico, USA, pp. 1638–1649 (2018)

  6. Peters, M.E., Neumann, M., Iyyer, M., Gardner, M., Lee, K., Zettlemoyer, L.: Deep contextualized word representations. In: Christopher, C. (2018)

  7. Devlin, J., Chang, M.-W., Lee, K., Kristina, T.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, vol. 1 (Long and Short Papers), Minneapolis, Minnesota, pp. 4171–4186 (2019)

  8. Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Guyon, I., Luxburg, Bengio, U.V., Wallach, H., Fergus, R., Vishwanathan, S., Garnett, R. (eds.), Advances in Neural Information Processing Systems, vol. 30, Curran Associates, Inc., pp. 5998–6008 (2017)

  9. Srikanth, P., Murthy, V.: Named entity recognition for Telugu. In: Proceedings of the IJCNLP-08 Workshop on Named Entity Recognition for South and South East Asian Languages (2008)

  10. Shishtla, P.M., Gali, K., Pingali, P., Varma, V.: Experiments in Telugu NER: a conditional random field approach. In: Proceedings of the IJCNLP-08 Workshop on Named Entity Recognition for South and South East Asian Languages (2008)

  11. Raju, G.V., Srinivasu, B., Raju, S.V., Balaram, A.: Named entity recognition for Telugu using conditional random field. Int. J. Comput. Linguist. (IJCL) 1(3), 36 (2010)

    Google Scholar 

  12. Sasidhar, B., Yohan, P.M., Babu, A.V., Govardhan, A.: Named entity recognition in Telugu language using language dependent features and rule based approach. Int. J. Comput. Appl. 22(8), 30–34 (2011)

    Google Scholar 

  13. Gorla, S., Bhanu Murthy, N. L., Malapati, A.: A comparative study of named entity recognition for telugu. In: FIRE’17, New York, NY, pp. 21–24 (2017)

  14. Gorla, S., Velivelli, S., Bhanu Murthy, N.L., Malapati, A.: Named entity recognition for Telugu news articles using naïve bayes classifier. In: Albakour, D., Corney, D., Gonzalo, J., Martinez-Alvarez, M., Poblete, B., Valochas, A. (eds.), Proceedings of the Second International Workshop on Recent Trends in News Information Retrieval co-located with 40th European Conference on Information Retrieval (ECIR 2018), Grenoble, France, March 26, 2018, CEUR Workshop Proceedings, vol. 2079, pp. 33–38 (2018)

  15. Gorla, S., Chandrashekhar, A., Bhanu Murthy, N.L., Malapati, A.: Telneclus: Telugu named entity clustering using semantic similarity. In: Verma, N.K., Ghosh, A.K. (eds.), Computational Intelligence: Theories, Applications and Future Directions, vol. II, Singapore, pp. 39–52 (2019)

  16. Gorla, S., Neti, L., Bhanu, M., Malapati, A.: Enhancing the performance of Telugu named entity recognition using gazetteer features. Information 11(2), 8 (2020)

    Article  Google Scholar 

  17. Adusumilli, M., Gorla, S.K., Neti, L.B.M., Reddy, A.J., Malapati, A.: Named entity recognition for telugu using lstm-crf. In: Jha, G.N., Bali, K., Sobha, L., Ojha, A.K. (eds.), Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018), Paris, France (2018)

  18. Collobert, R., Weston, J., Bottou, L., Karlen, M., Kavukcuoglu, K., Kuksa, P.: Natural language processing (almost) from scratch. J. Mach. Learn. Res. 12(76), 2493–2537 (2011)

    MATH  Google Scholar 

  19. dos Santos, C.N., Guimarães, V.: Boosting named entity recognition with neural character embeddings. CoRR. arXiv:1505.05008 (2015)

  20. Kaur, K.: Khushleen@iecsil-fire-2018: Indic language named entity recognition using bidirectional lstms with subword information. In: Parth, M., Paolo, R., Prasenjit, M., Mandar, M, (eds.), Working Notes of FIRE 2018—Forum for Information Retrieval Evaluation, Gandhinagar, India, December 6–9, CEUR Workshop Proceedings, vol. 2266, CEUR-WS.org, pp. 153–157 (2018)

  21. Bhattu, S.N., Krishna, N.S., Somayajulu, D.V.: idrbt-team-a@iecsil-fire-2018 named entity recognition of Indian languages using bi-lstm’, booktitle =

  22. Lample, G., Ballesteros, M., Subramanian, S., Kawakami, K., Dyer, C.: Neural architectures for named entity recognition. CoRR. arXiv:1603.01360 (2016)

  23. Mikolov, T., Chen, K., Corrado, G., Dean, J.: Efficient estimation of word representations in vector space. In: Yoshua, B., Yann, L. (eds.) 1st International Conference on Learning Representations, ICLR 2013, Scottsdale, Arizona, USA, May 2–4, Workshop Track Proceedings (2013)

  24. Pennington, J., Socher, R., Manning, C.: GloVe: global vectors for word representation. In: Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), Doha, Qatar, pp. 1532–1543 (2014)

  25. Peters, M.E., Neumann, M., Iyyer, M., Gardner, M., Clark, C., Lee, K., Zettlemoyer, L.: Deep contextualized word representations. In: Proceedings of of NAACL (2018)

  26. Ghaddar, A., Langlais, P.: Robust lexical features for improved neural network named-entity recognition (2018)

  27. Huang, Z., Xu, W., Yu, K.: Bidirectional LSTM-CRF models for sequence tagging (2015)

  28. Strubell, E., Verga, P., Belanger, D., McCallum, A.: Fast and accurate entity recognition with iterated dilated convolutions (2017)

  29. Aguilar, G., Maharjan, S., López Monroy, A.P., Solorio, T.: A multi-task approach for named entity recognition in social media data. In: Proceedings of the 3rd Workshop on Noisy User-generated Text (2017)

  30. Lee, J., Yoon, W., Kim, S., Kim, D., Kim, S., So, C.H., Kang, J.: BioBERT: a pre-trained biomedical language representation model for biomedical text mining. Bioinformatics 36(4), 1234–1240 (2020)

    Google Scholar 

  31. Souza, F., Nogueira, R., Lotufo, R.: Portuguese named entity recognition using BERT-CRF. arXiv preprint arXiv:1909.10649 (2019)

  32. de Vries, W., van Cranenburgh, A., Bisazza, A., Caselli, T., van Noord, G., Nissim, M. Bertje: a dutch bert model. arXiv preprint arXiv:1912.09582 (2019)

  33. Li, X., Zhang, H., Zhou, X.-H.: Chinese clinical named entity recognition with variant neural structures based on BERT methods. J. Biomed. Inform. 107, 103422 (2020)

    Article  Google Scholar 

  34. Yonghui, W., Schuster, M., Chen, Z., Le, Q.V., et al.: Bridging the gap between human and machine translation, Google’s neural machine translation system (2016)

  35. Bharadwaja Kumar, G., Muthy, Kavi Narayana, Chaushri, B.B.: Statistical analyses of Telugu text corpora. IJDL Int. J. Dravid. Linguist. 36(2), 71–99 (2007)

    Google Scholar 

  36. Gorla, S., Velivelli, S., Satpathi, D. K., Bhanu Murthy, N.L., Malapati, A.: Named entity recognition using part-of-speech rules for Telugu. In: Elçi, A., Sa, P.K., Modi, C.N. Olague, G., Sahoo, M.N., Bakshi, S. (eds.), Smart Computing Paradigms: New Progresses and Challenges, Singapore, pp. 147–157 (2020)

  37. Reddy, S., Sharoff, S.: Cross language POS taggers (and other tools) for Indian languages: an experiment with Kannada using Telugu resources. In: Proceedings of the Fifth International Workshop On Cross Lingual Information Access, Asian Federation of Natural Language Processing, Chiang Mai, Thailand, pp. 11–19 (2011)

  38. Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to SaiKiranmai Gorla.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Gorla, S., Tangeda, S.S., Neti, L.B.M. et al. Telugu named entity recognition using bert. Int J Data Sci Anal 14, 127–140 (2022). https://doi.org/10.1007/s41060-021-00305-w

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s41060-021-00305-w

Keywords

Navigation