Abstract
The work described in this paper is related to named entity recognition (NER) for Hindi language. In this paper, an efficient and effective framework for extracting Hindi Named Entities (NEs) from Hindi text using a Hybrid Model comprising of Rule-based approach and Conditional Random Fields (CRF) based Machine Learning (ML) approach, has been proposed. Main goal of this research work is to exploit the advantages of both the approaches so as to come up with a more efficient hybrid model and study the effects on resultant NER System. The experiments conducted show that hybrid approach can result in higher performance compared with the performance of independent rule-based system or independent machine learning system for NER task. This paper also focuses on the task to identify specifications of the Hindi language, and hence prepare an efficient and appropriate feature set, which when used for training statistical models give better accuracy. The proposed methods have been evaluated by carefully designed experiments, following the standard practice in this field. The system shows overall recall, precision, and f-score values of 98.21%, 99.51%, and 98.82%, respectively.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Grishman, R. Sundheim, B.: Message understanding conference - 6: a brief history. In: Proceedings of International Conference on Computational Linguistics (1996)
Rau, L.F.: Extracting company names from text. In: Proceedings of Conference on Artificial Intelligence Applications of IEEE (1991)
Chinchor, N., Robinson, P., Brown, E. Hub-4 named entity task definition. In: Proceedings of DARPA Broadcast News Workshop (1998)
Chinchor, N.: Overview of MUC-7/MET-2. In: Proceedings of Message Understanding Conference MUC-7 (1999)
Sekine, S., Isahara, H.: IREX: IR and IE evaluation project in Japanese. In: Proceedings of Conference on Language Resources and Evaluation (2000)
Tjong Kim Sang, E.F.: Introduction to the CoNLL-2002 shared task: language-independent named entity recognition. In: Proceedings of Conference on Natural Language Learning (2002)
Tjong Kim Sang, E.F., De Meulder, F.: Introduction to the CoNLL-2003 shared task: language-independent named entity recognition. In: Proceedings of Conference on Natural Language Learning (2003)
Doddington, G., Mitchell, A., Przybocki, M., Ramshaw, L., Strassel, S., Weischedel, R.: The automatic content extraction (ACE) program – tasks, data, and evaluation. In: Proceedings of Conference on Language Resources and Evaluation (2004)
Santos, D., Seco, N., Cardoso, N., Vilela, R.: HAREM: an advanced NER evaluation contest for Portuguese. In: Proceedings of International Conference on Language Resources and Evaluation (2006)
Bikel, D.M., Miller, S., Schwartz, R., Weischedel, R.: Nymble: a high-performance learning name-finder. In: Proceedings of the Fifth Conference on Applied Natural Language Processing (ANLP 1997), Washington, D.C., pp. 194–201 (1997)
Borthwick, A.: A maximum entropy approach to named entity recognition. Ph.D. thesis, Computer Science Department, New York University (1999)
Maynard, D., Tablan, V., Cunningham, H.: NE recognition without training data on a language you don't speak. In: ACL Workshop on Multilingual and Mixed-language Named Entity Recognition: Combining Statistical and Symbolic Models, Sapporo, Japan (2003)
Mccallum, A., Li, W.: Early results for named entity recognition with conditional random fields, feature induction and web-enhanced lexicons. In: Seventh Conference on Natural Language Learning (CoNLL), Edmonton, Canada, June (2003)
Ferro, L., Gerber, L., Mani, I., Sundheim, B., Wilson, G.: TIDES 2005 Standard for the Annotation of Temporal Expressions. The MITRE Corporation. Fleischman, Michael (2001). Automated Subcategorization of Named Entities. In: Proceedings of Conference on the European Chapter of Association for Computational Linguistic (2005)
Palmer, D.D., Burger, J.D., Ostendorf, M.: Information extraction from broadcast news speech data. In: Proceedings of the DARPA Broadcast News Workshop, February 28-March 3, pp. 41–46. Morgan Kaufmann Publishers (1999)
Cucerzan, S., Yarowsky, D.: Language independent named entity recognition combining morphological and contextual evidence. In: Proceedings of Joint Sigdat Conference on Empirical Methods in Natural Language Processing (1999)
Li, W.: University of Massachusetts Amherst and Andrew McCallum University of Massachusetts Amherst 2004, Rapid Development of Hindi Named Entity Recognition using Conditional Random Fields and Feature Induction (Short Paper)
Sutton, C.: An Introduction to Conditional Random Fields for Relational Learning
Lafferty, J., McCallum, A., Pereira, F.: Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data.
Sha, F., Pereira, F.: Shallow parsing with conditional random fields. In: The Proceedings of HLT-NAACL (2003)
Pinto, D., McCallum, A., Wei, X., Croft, W.B.: Table extraction using conditional random fields. In: Proceedings of SIGIR 03 Conference, Toronto, Canada (2003)
CRF++: Yet another Toolkit. https://crfpp.sourceforge.net/
Srivastava, S., Abraham, S., Sanglikar, M., Kothari, D.C.: Role of ensemble learning in identifying Hindi names. Int. J. Comput. Sci. Appl. (2011). ISSN 0974-0767
Srivastava, S., Sanglikar, M., Kothari, D.C.: Hindi named entity recognition for Hindi language. Int. J. Comput. Linguist. (IJCL) 2(1) (2011)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2021 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Srivastava, S. (2021). HILNER: A Hindi Language Named Entity Recognition System Based on Hybrid Approach. In: Abraham, A., Hanne, T., Castillo, O., Gandhi, N., Nogueira Rios, T., Hong, TP. (eds) Hybrid Intelligent Systems. HIS 2020. Advances in Intelligent Systems and Computing, vol 1375. Springer, Cham. https://doi.org/10.1007/978-3-030-73050-5_34
Download citation
DOI: https://doi.org/10.1007/978-3-030-73050-5_34
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-73049-9
Online ISBN: 978-3-030-73050-5
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)