Skip to main content

HILNER: A Hindi Language Named Entity Recognition System Based on Hybrid Approach

  • Conference paper
  • First Online:
Hybrid Intelligent Systems (HIS 2020)

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 1375))

Included in the following conference series:

  • 713 Accesses

Abstract

The work described in this paper is related to named entity recognition (NER) for Hindi language. In this paper, an efficient and effective framework for extracting Hindi Named Entities (NEs) from Hindi text using a Hybrid Model comprising of Rule-based approach and Conditional Random Fields (CRF) based Machine Learning (ML) approach, has been proposed. Main goal of this research work is to exploit the advantages of both the approaches so as to come up with a more efficient hybrid model and study the effects on resultant NER System. The experiments conducted show that hybrid approach can result in higher performance compared with the performance of independent rule-based system or independent machine learning system for NER task. This paper also focuses on the task to identify specifications of the Hindi language, and hence prepare an efficient and appropriate feature set, which when used for training statistical models give better accuracy. The proposed methods have been evaluated by carefully designed experiments, following the standard practice in this field. The system shows overall recall, precision, and f-score values of 98.21%, 99.51%, and 98.82%, respectively.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 169.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 219.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Grishman, R. Sundheim, B.: Message understanding conference - 6: a brief history. In: Proceedings of International Conference on Computational Linguistics (1996)

    Google Scholar 

  2. Rau, L.F.: Extracting company names from text. In: Proceedings of Conference on Artificial Intelligence Applications of IEEE (1991)

    Google Scholar 

  3. Chinchor, N., Robinson, P., Brown, E. Hub-4 named entity task definition. In: Proceedings of DARPA Broadcast News Workshop (1998)

    Google Scholar 

  4. Chinchor, N.: Overview of MUC-7/MET-2. In: Proceedings of Message Understanding Conference MUC-7 (1999)

    Google Scholar 

  5. Sekine, S., Isahara, H.: IREX: IR and IE evaluation project in Japanese. In: Proceedings of Conference on Language Resources and Evaluation (2000)

    Google Scholar 

  6. Tjong Kim Sang, E.F.: Introduction to the CoNLL-2002 shared task: language-independent named entity recognition. In: Proceedings of Conference on Natural Language Learning (2002)

    Google Scholar 

  7. Tjong Kim Sang, E.F., De Meulder, F.: Introduction to the CoNLL-2003 shared task: language-independent named entity recognition. In: Proceedings of Conference on Natural Language Learning (2003)

    Google Scholar 

  8. Doddington, G., Mitchell, A., Przybocki, M., Ramshaw, L., Strassel, S., Weischedel, R.: The automatic content extraction (ACE) program – tasks, data, and evaluation. In: Proceedings of Conference on Language Resources and Evaluation (2004)

    Google Scholar 

  9. Santos, D., Seco, N., Cardoso, N., Vilela, R.: HAREM: an advanced NER evaluation contest for Portuguese. In: Proceedings of International Conference on Language Resources and Evaluation (2006)

    Google Scholar 

  10. Bikel, D.M., Miller, S., Schwartz, R., Weischedel, R.: Nymble: a high-performance learning name-finder. In: Proceedings of the Fifth Conference on Applied Natural Language Processing (ANLP 1997), Washington, D.C., pp. 194–201 (1997)

    Google Scholar 

  11. Borthwick, A.: A maximum entropy approach to named entity recognition. Ph.D. thesis, Computer Science Department, New York University (1999)

    Google Scholar 

  12. Maynard, D., Tablan, V., Cunningham, H.: NE recognition without training data on a language you don't speak. In: ACL Workshop on Multilingual and Mixed-language Named Entity Recognition: Combining Statistical and Symbolic Models, Sapporo, Japan (2003)

    Google Scholar 

  13. Mccallum, A., Li, W.: Early results for named entity recognition with conditional random fields, feature induction and web-enhanced lexicons. In: Seventh Conference on Natural Language Learning (CoNLL), Edmonton, Canada, June (2003)

    Google Scholar 

  14. Ferro, L., Gerber, L., Mani, I., Sundheim, B., Wilson, G.: TIDES 2005 Standard for the Annotation of Temporal Expressions. The MITRE Corporation. Fleischman, Michael (2001). Automated Subcategorization of Named Entities. In: Proceedings of Conference on the European Chapter of Association for Computational Linguistic (2005)

    Google Scholar 

  15. Palmer, D.D., Burger, J.D., Ostendorf, M.: Information extraction from broadcast news speech data. In: Proceedings of the DARPA Broadcast News Workshop, February 28-March 3, pp. 41–46. Morgan Kaufmann Publishers (1999)

    Google Scholar 

  16. Cucerzan, S., Yarowsky, D.: Language independent named entity recognition combining morphological and contextual evidence. In: Proceedings of Joint Sigdat Conference on Empirical Methods in Natural Language Processing (1999)

    Google Scholar 

  17. Li, W.: University of Massachusetts Amherst and Andrew McCallum University of Massachusetts Amherst 2004, Rapid Development of Hindi Named Entity Recognition using Conditional Random Fields and Feature Induction (Short Paper)

    Google Scholar 

  18. Sutton, C.: An Introduction to Conditional Random Fields for Relational Learning

    Google Scholar 

  19. Lafferty, J., McCallum, A., Pereira, F.: Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data.

    Google Scholar 

  20. Sha, F., Pereira, F.: Shallow parsing with conditional random fields. In: The Proceedings of HLT-NAACL (2003)

    Google Scholar 

  21. Pinto, D., McCallum, A., Wei, X., Croft, W.B.: Table extraction using conditional random fields. In: Proceedings of SIGIR 03 Conference, Toronto, Canada (2003)

    Google Scholar 

  22. CRF++: Yet another Toolkit. https://crfpp.sourceforge.net/

  23. Srivastava, S., Abraham, S., Sanglikar, M., Kothari, D.C.: Role of ensemble learning in identifying Hindi names. Int. J. Comput. Sci. Appl. (2011). ISSN 0974-0767

    Google Scholar 

  24. Srivastava, S., Sanglikar, M., Kothari, D.C.: Hindi named entity recognition for Hindi language. Int. J. Comput. Linguist. (IJCL) 2(1) (2011)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2021 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Srivastava, S. (2021). HILNER: A Hindi Language Named Entity Recognition System Based on Hybrid Approach. In: Abraham, A., Hanne, T., Castillo, O., Gandhi, N., Nogueira Rios, T., Hong, TP. (eds) Hybrid Intelligent Systems. HIS 2020. Advances in Intelligent Systems and Computing, vol 1375. Springer, Cham. https://doi.org/10.1007/978-3-030-73050-5_34

Download citation

Publish with us

Policies and ethics