HILNER: A Hindi Language Named Entity Recognition System Based on Hybrid Approach

Srivastava, Shilpi

doi:10.1007/978-3-030-73050-5_34

Shilpi Srivastava²⁰

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 1375))

Included in the following conference series:

International Conference on Hybrid Intelligent Systems

713 Accesses

Abstract

The work described in this paper is related to named entity recognition (NER) for Hindi language. In this paper, an efficient and effective framework for extracting Hindi Named Entities (NEs) from Hindi text using a Hybrid Model comprising of Rule-based approach and Conditional Random Fields (CRF) based Machine Learning (ML) approach, has been proposed. Main goal of this research work is to exploit the advantages of both the approaches so as to come up with a more efficient hybrid model and study the effects on resultant NER System. The experiments conducted show that hybrid approach can result in higher performance compared with the performance of independent rule-based system or independent machine learning system for NER task. This paper also focuses on the task to identify specifications of the Hindi language, and hence prepare an efficient and appropriate feature set, which when used for training statistical models give better accuracy. The proposed methods have been evaluated by carefully designed experiments, following the standard practice in this field. The system shows overall recall, precision, and f-score values of 98.21%, 99.51%, and 98.82%, respectively.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 169.00; Price excludes VAT (USA)

Softcover Book: USD 219.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Grishman, R. Sundheim, B.: Message understanding conference - 6: a brief history. In: Proceedings of International Conference on Computational Linguistics (1996)
Google Scholar
Rau, L.F.: Extracting company names from text. In: Proceedings of Conference on Artificial Intelligence Applications of IEEE (1991)
Google Scholar
Chinchor, N., Robinson, P., Brown, E. Hub-4 named entity task definition. In: Proceedings of DARPA Broadcast News Workshop (1998)
Google Scholar
Chinchor, N.: Overview of MUC-7/MET-2. In: Proceedings of Message Understanding Conference MUC-7 (1999)
Google Scholar
Sekine, S., Isahara, H.: IREX: IR and IE evaluation project in Japanese. In: Proceedings of Conference on Language Resources and Evaluation (2000)
Google Scholar
Tjong Kim Sang, E.F.: Introduction to the CoNLL-2002 shared task: language-independent named entity recognition. In: Proceedings of Conference on Natural Language Learning (2002)
Google Scholar
Tjong Kim Sang, E.F., De Meulder, F.: Introduction to the CoNLL-2003 shared task: language-independent named entity recognition. In: Proceedings of Conference on Natural Language Learning (2003)
Google Scholar
Doddington, G., Mitchell, A., Przybocki, M., Ramshaw, L., Strassel, S., Weischedel, R.: The automatic content extraction (ACE) program – tasks, data, and evaluation. In: Proceedings of Conference on Language Resources and Evaluation (2004)
Google Scholar
Santos, D., Seco, N., Cardoso, N., Vilela, R.: HAREM: an advanced NER evaluation contest for Portuguese. In: Proceedings of International Conference on Language Resources and Evaluation (2006)
Google Scholar
Bikel, D.M., Miller, S., Schwartz, R., Weischedel, R.: Nymble: a high-performance learning name-finder. In: Proceedings of the Fifth Conference on Applied Natural Language Processing (ANLP 1997), Washington, D.C., pp. 194–201 (1997)
Google Scholar
Borthwick, A.: A maximum entropy approach to named entity recognition. Ph.D. thesis, Computer Science Department, New York University (1999)
Google Scholar
Maynard, D., Tablan, V., Cunningham, H.: NE recognition without training data on a language you don't speak. In: ACL Workshop on Multilingual and Mixed-language Named Entity Recognition: Combining Statistical and Symbolic Models, Sapporo, Japan (2003)
Google Scholar
Mccallum, A., Li, W.: Early results for named entity recognition with conditional random fields, feature induction and web-enhanced lexicons. In: Seventh Conference on Natural Language Learning (CoNLL), Edmonton, Canada, June (2003)
Google Scholar
Ferro, L., Gerber, L., Mani, I., Sundheim, B., Wilson, G.: TIDES 2005 Standard for the Annotation of Temporal Expressions. The MITRE Corporation. Fleischman, Michael (2001). Automated Subcategorization of Named Entities. In: Proceedings of Conference on the European Chapter of Association for Computational Linguistic (2005)
Google Scholar
Palmer, D.D., Burger, J.D., Ostendorf, M.: Information extraction from broadcast news speech data. In: Proceedings of the DARPA Broadcast News Workshop, February 28-March 3, pp. 41–46. Morgan Kaufmann Publishers (1999)
Google Scholar
Cucerzan, S., Yarowsky, D.: Language independent named entity recognition combining morphological and contextual evidence. In: Proceedings of Joint Sigdat Conference on Empirical Methods in Natural Language Processing (1999)
Google Scholar
Li, W.: University of Massachusetts Amherst and Andrew McCallum University of Massachusetts Amherst 2004, Rapid Development of Hindi Named Entity Recognition using Conditional Random Fields and Feature Induction (Short Paper)
Google Scholar
Sutton, C.: An Introduction to Conditional Random Fields for Relational Learning
Google Scholar
Lafferty, J., McCallum, A., Pereira, F.: Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data.
Google Scholar
Sha, F., Pereira, F.: Shallow parsing with conditional random fields. In: The Proceedings of HLT-NAACL (2003)
Google Scholar
Pinto, D., McCallum, A., Wei, X., Croft, W.B.: Table extraction using conditional random fields. In: Proceedings of SIGIR 03 Conference, Toronto, Canada (2003)
Google Scholar
CRF++: Yet another Toolkit. https://crfpp.sourceforge.net/
Srivastava, S., Abraham, S., Sanglikar, M., Kothari, D.C.: Role of ensemble learning in identifying Hindi names. Int. J. Comput. Sci. Appl. (2011). ISSN 0974-0767
Google Scholar
Srivastava, S., Sanglikar, M., Kothari, D.C.: Hindi named entity recognition for Hindi language. Int. J. Comput. Linguist. (IJCL) 2(1) (2011)
Google Scholar

Download references

Author information

Authors and Affiliations

Mumbai University, Mumbai, India
Shilpi Srivastava

Authors

Shilpi Srivastava
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Scientific Network for Innovation and Research Excellence, Machine Intelligence Research Labs (MIR Labs), Auburn, WA, USA
Ajith Abraham
Institute for Information Systems, University of Applied Sciences and Arts Northwestern Switzerland, Olten, Solothurn, Switzerland
Thomas Hanne
Division of Graduate Studies, Tijuana Institute of Technology, Tijuana, Mexico
Oscar Castillo
Scientific Network for Innovation and Research Excellence, Machine Intelligence Research Labs (MIR Labs), Auburn, AL, USA
Niketa Gandhi
Universidade Federal da Bahia, Salvador, Bahia, Brazil
Tatiane Nogueira Rios
Computer Science and Information Engineering, National University of Kaohsiung, Kaohsiung, Taiwan
Tzung-Pei Hong

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Srivastava, S. (2021). HILNER: A Hindi Language Named Entity Recognition System Based on Hybrid Approach. In: Abraham, A., Hanne, T., Castillo, O., Gandhi, N., Nogueira Rios, T., Hong, TP. (eds) Hybrid Intelligent Systems. HIS 2020. Advances in Intelligent Systems and Computing, vol 1375. Springer, Cham. https://doi.org/10.1007/978-3-030-73050-5_34

Download citation

DOI: https://doi.org/10.1007/978-3-030-73050-5_34
Published: 17 April 2021
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-73049-9
Online ISBN: 978-3-030-73050-5
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)

Publish with us

Policies and ethics