Telugu named entity recognition using bert

Gorla, SaiKiranmai; Tangeda, Sai Sharan; Neti, Lalita Bhanu Murthy; Malapati, Aruna

doi:10.1007/s41060-021-00305-w

Telugu named entity recognition using bert

Regular Paper
Published: 18 January 2022

Volume 14, pages 127–140, (2022)
Cite this article

International Journal of Data Science and Analytics Aims and scope Submit manuscript

SaiKiranmai Gorla ORCID: orcid.org/0000-0001-7383-5234¹,
Sai Sharan Tangeda¹,
Lalita Bhanu Murthy Neti¹ &
…
Aruna Malapati¹

377 Accesses
3 Citations
1 Altmetric
Explore all metrics

Abstract

Named entity recognition (NER) is a fundamental step for many Natural Language Processing tasks that aim to classify words into a predefined set of named entities (NE). For high-resource languages like English, many deep learning architectures have produced good results. However, the NER task has not yet achieved much progress for Telugu, a low resource Language. This paper performs the NER task on Telugu Language using Word2Vec, Glove, FastText, Contextual String embedding, and bidirectional encoder representations from transformers (BERT) embeddings generated using Telugu Wikipedia articles. These embeddings have been used as input to build deep learning models. We also investigated the effect of concatenating handcrafted features with the word embeddings on the deep learning model’s performance. Our experimental results demonstrate that embeddings generated from BERT added with handcrafted features have outperformed other word embedding models with an F1-Score 96.32%.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Named Entity Recognition Using Deep Learning and BERT for Tamil and Hindi Languages

Using Pre-trained Deeply Contextual Model BERT for Russian Named Entity Recognition

A deep neural framework for named entity recognition with boosted word embeddings

Article 13 July 2023

Notes

References

Grishman, R., Sundheim, B.: Message understanding conference- 6: a brief history. In: COLING 1996, The 16th International Conference on Computational Linguistics, vol. 1 (1996)
Chiu, J.P.C., Nichols, E.: Named entity recognition with bidirectional LSTM-CNNs. Trans. Assoc. Comput. Linguist. 4, 357–370 (2016)
Article Google Scholar
Ma, X., Hovy, E.: End-to-end sequence labeling via bi-directional LSTM-CNNs-CRF. In: Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics, vol.1 (Long Papers), Berlin, pp. 1064–1074 (2016)
Bojanowski, P., Grave, E., Joulin, A., Mikolov, T.: Enriching word vectors with subword information. Trans. Assoc. Comput. Linguist. 5, 135–146 (2017)
Article Google Scholar
Akbik, A., Bergmann, T., Blythe, D., Rasul, K., Schweter, S., Vollgraf, R.: Contextual string embeddings for sequence labeling. In: Proceedings of the 27th International Conference on Computational Linguistics, Santa Fe, New Mexico, USA, pp. 1638–1649 (2018)
Peters, M.E., Neumann, M., Iyyer, M., Gardner, M., Lee, K., Zettlemoyer, L.: Deep contextualized word representations. In: Christopher, C. (2018)
Devlin, J., Chang, M.-W., Lee, K., Kristina, T.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, vol. 1 (Long and Short Papers), Minneapolis, Minnesota, pp. 4171–4186 (2019)
Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Guyon, I., Luxburg, Bengio, U.V., Wallach, H., Fergus, R., Vishwanathan, S., Garnett, R. (eds.), Advances in Neural Information Processing Systems, vol. 30, Curran Associates, Inc., pp. 5998–6008 (2017)
Srikanth, P., Murthy, V.: Named entity recognition for Telugu. In: Proceedings of the IJCNLP-08 Workshop on Named Entity Recognition for South and South East Asian Languages (2008)
Shishtla, P.M., Gali, K., Pingali, P., Varma, V.: Experiments in Telugu NER: a conditional random field approach. In: Proceedings of the IJCNLP-08 Workshop on Named Entity Recognition for South and South East Asian Languages (2008)
Raju, G.V., Srinivasu, B., Raju, S.V., Balaram, A.: Named entity recognition for Telugu using conditional random field. Int. J. Comput. Linguist. (IJCL) 1(3), 36 (2010)
Google Scholar
Sasidhar, B., Yohan, P.M., Babu, A.V., Govardhan, A.: Named entity recognition in Telugu language using language dependent features and rule based approach. Int. J. Comput. Appl. 22(8), 30–34 (2011)
Google Scholar
Gorla, S., Bhanu Murthy, N. L., Malapati, A.: A comparative study of named entity recognition for telugu. In: FIRE’17, New York, NY, pp. 21–24 (2017)
Gorla, S., Velivelli, S., Bhanu Murthy, N.L., Malapati, A.: Named entity recognition for Telugu news articles using naïve bayes classifier. In: Albakour, D., Corney, D., Gonzalo, J., Martinez-Alvarez, M., Poblete, B., Valochas, A. (eds.), Proceedings of the Second International Workshop on Recent Trends in News Information Retrieval co-located with 40th European Conference on Information Retrieval (ECIR 2018), Grenoble, France, March 26, 2018, CEUR Workshop Proceedings, vol. 2079, pp. 33–38 (2018)
Gorla, S., Chandrashekhar, A., Bhanu Murthy, N.L., Malapati, A.: Telneclus: Telugu named entity clustering using semantic similarity. In: Verma, N.K., Ghosh, A.K. (eds.), Computational Intelligence: Theories, Applications and Future Directions, vol. II, Singapore, pp. 39–52 (2019)
Gorla, S., Neti, L., Bhanu, M., Malapati, A.: Enhancing the performance of Telugu named entity recognition using gazetteer features. Information 11(2), 8 (2020)
Article Google Scholar
Adusumilli, M., Gorla, S.K., Neti, L.B.M., Reddy, A.J., Malapati, A.: Named entity recognition for telugu using lstm-crf. In: Jha, G.N., Bali, K., Sobha, L., Ojha, A.K. (eds.), Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018), Paris, France (2018)
Collobert, R., Weston, J., Bottou, L., Karlen, M., Kavukcuoglu, K., Kuksa, P.: Natural language processing (almost) from scratch. J. Mach. Learn. Res. 12(76), 2493–2537 (2011)
MATH Google Scholar
dos Santos, C.N., Guimarães, V.: Boosting named entity recognition with neural character embeddings. CoRR. arXiv:1505.05008 (2015)
Kaur, K.: Khushleen@iecsil-fire-2018: Indic language named entity recognition using bidirectional lstms with subword information. In: Parth, M., Paolo, R., Prasenjit, M., Mandar, M, (eds.), Working Notes of FIRE 2018—Forum for Information Retrieval Evaluation, Gandhinagar, India, December 6–9, CEUR Workshop Proceedings, vol. 2266, CEUR-WS.org, pp. 153–157 (2018)
Bhattu, S.N., Krishna, N.S., Somayajulu, D.V.: idrbt-team-a@iecsil-fire-2018 named entity recognition of Indian languages using bi-lstm’, booktitle =
Lample, G., Ballesteros, M., Subramanian, S., Kawakami, K., Dyer, C.: Neural architectures for named entity recognition. CoRR. arXiv:1603.01360 (2016)
Mikolov, T., Chen, K., Corrado, G., Dean, J.: Efficient estimation of word representations in vector space. In: Yoshua, B., Yann, L. (eds.) 1st International Conference on Learning Representations, ICLR 2013, Scottsdale, Arizona, USA, May 2–4, Workshop Track Proceedings (2013)
Pennington, J., Socher, R., Manning, C.: GloVe: global vectors for word representation. In: Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), Doha, Qatar, pp. 1532–1543 (2014)
Peters, M.E., Neumann, M., Iyyer, M., Gardner, M., Clark, C., Lee, K., Zettlemoyer, L.: Deep contextualized word representations. In: Proceedings of of NAACL (2018)
Ghaddar, A., Langlais, P.: Robust lexical features for improved neural network named-entity recognition (2018)
Huang, Z., Xu, W., Yu, K.: Bidirectional LSTM-CRF models for sequence tagging (2015)
Strubell, E., Verga, P., Belanger, D., McCallum, A.: Fast and accurate entity recognition with iterated dilated convolutions (2017)
Aguilar, G., Maharjan, S., López Monroy, A.P., Solorio, T.: A multi-task approach for named entity recognition in social media data. In: Proceedings of the 3rd Workshop on Noisy User-generated Text (2017)
Lee, J., Yoon, W., Kim, S., Kim, D., Kim, S., So, C.H., Kang, J.: BioBERT: a pre-trained biomedical language representation model for biomedical text mining. Bioinformatics 36(4), 1234–1240 (2020)
Google Scholar
Souza, F., Nogueira, R., Lotufo, R.: Portuguese named entity recognition using BERT-CRF. arXiv preprint arXiv:1909.10649 (2019)
de Vries, W., van Cranenburgh, A., Bisazza, A., Caselli, T., van Noord, G., Nissim, M. Bertje: a dutch bert model. arXiv preprint arXiv:1912.09582 (2019)
Li, X., Zhang, H., Zhou, X.-H.: Chinese clinical named entity recognition with variant neural structures based on BERT methods. J. Biomed. Inform. 107, 103422 (2020)
Article Google Scholar
Yonghui, W., Schuster, M., Chen, Z., Le, Q.V., et al.: Bridging the gap between human and machine translation, Google’s neural machine translation system (2016)
Bharadwaja Kumar, G., Muthy, Kavi Narayana, Chaushri, B.B.: Statistical analyses of Telugu text corpora. IJDL Int. J. Dravid. Linguist. 36(2), 71–99 (2007)
Google Scholar
Gorla, S., Velivelli, S., Satpathi, D. K., Bhanu Murthy, N.L., Malapati, A.: Named entity recognition using part-of-speech rules for Telugu. In: Elçi, A., Sa, P.K., Modi, C.N. Olague, G., Sahoo, M.N., Bakshi, S. (eds.), Smart Computing Paradigms: New Progresses and Challenges, Singapore, pp. 147–157 (2020)
Reddy, S., Sharoff, S.: Cross language POS taggers (and other tools) for Indian languages: an experiment with Kannada using Telugu resources. In: Proceedings of the Fifth International Workshop On Cross Lingual Information Access, Asian Federation of Natural Language Processing, Chiang Mai, Thailand, pp. 11–19 (2011)
Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997)
Article Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science and Information Systems, Birla Institute of Technology and Science Pilani, Hyderabad Campus, Hyderabad, India
SaiKiranmai Gorla, Sai Sharan Tangeda, Lalita Bhanu Murthy Neti & Aruna Malapati

Authors

SaiKiranmai Gorla
View author publications
You can also search for this author in PubMed Google Scholar
Sai Sharan Tangeda
View author publications
You can also search for this author in PubMed Google Scholar
Lalita Bhanu Murthy Neti
View author publications
You can also search for this author in PubMed Google Scholar
Aruna Malapati
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to SaiKiranmai Gorla.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Gorla, S., Tangeda, S.S., Neti, L.B.M. et al. Telugu named entity recognition using bert. Int J Data Sci Anal 14, 127–140 (2022). https://doi.org/10.1007/s41060-021-00305-w

Download citation

Received: 16 April 2021
Accepted: 16 December 2021
Published: 18 January 2022
Issue Date: August 2022
DOI: https://doi.org/10.1007/s41060-021-00305-w

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Telugu named entity recognition using bert

Abstract

Access this article

Similar content being viewed by others

Named Entity Recognition Using Deep Learning and BERT for Tamil and Hindi Languages

Using Pre-trained Deeply Contextual Model BERT for Russian Named Entity Recognition

A deep neural framework for named entity recognition with boosted word embeddings

Notes

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Telugu named entity recognition using bert

Abstract

Access this article

Similar content being viewed by others

Named Entity Recognition Using Deep Learning and BERT for Tamil and Hindi Languages

Using Pre-trained Deeply Contextual Model BERT for Russian Named Entity Recognition

A deep neural framework for named entity recognition with boosted word embeddings

Notes

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation