Resolution of Data Sparseness in Named Entity Recognition Using Hierarchical Features and Feature Relaxation Principle

Zhou, Guodong; Su, Jian; Yang, Lingpeng

doi:10.1007/978-3-540-30586-6_84

Guodong Zhou¹⁷,
Jian Su¹⁷ &
Lingpeng Yang¹⁷

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 3406))

Included in the following conference series:

International Conference on Intelligent Text Processing and Computational Linguistics

2311 Accesses

Abstract

This paper introduces a Mutual Information Independence Model (MIIM) and proposes a feature relaxation principle to resolve the data sparseness problem in MIIM-based named entity recognition via hierarchical features. In this way, a named entity recognition system with better performance and better portability can be achieved. Evaluation of our system on MUC-6 and MUC-7 English named entity tasks achieves F-measures of 96.1% and 93.7% respectively. It also shows that 20K words of training data would have given the performance of 90 percent with the hierarchical structure in the features compared with 30K words without the hierarchical structure in the features. This suggests that the hierarchical features provide a potential for much better portability.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Does semantics aid syntax? An empirical study on named entity recognition and classification

Article 10 April 2021

Combining rule-based and statistical mechanisms for low-resource named entity recognition

Article 20 December 2017

The Five Generations of Entity Resolution on Web Data

References

Chinchor, N.: MUC-6 Named Entity Task Definition (Version 2.1). In: Proceedings of the Sixth Message Understanding Conference (MUC-6), Columbia, Maryland (1995a)
Google Scholar
Chinchor, N.: MUC-7 Named Entity Task Definition (Version 3.5). In: Proceedings of the Seventh Message Understanding Conference (MUC-7), Fairfax, Virginia (1998a)
Google Scholar
Aone, C., Halverson, L., Hampton, T., Ramos-Santacruz, M.: SRA: Description of the IE2 System Used for MUC-7. In: Proceedings of the Seventh Message Understanding Conference (MUC-7), Fairfax, Virginia (1998)
Google Scholar
Krupka, G.R., Hausman, K.: IsoQuest Inc.: Description of the NetOwlTM Extractor System as Used for MUC-7. In: Proceedings of the Seventh Message Understanding Conference (MUC-7), Fairfax, Virginia (1998)
Google Scholar
Mikheev, A., Grover, C., Moens, M.: Description of the LTG System Used for MUC-7. In: Proceedings of the Seventh Message Understanding Conference (MUC-7), Fairfax, Virginia (1998)
Google Scholar
Mikheev, A., Moens, M., Grover, C.: Named entity recognition without gazeteers. In: Proceedings of the Ninth Conference the European Chapter of the Association for Computational Linguistics (EACL 1999), Bergen, Norway, pp. 1–8 (1999)
Google Scholar
Miller, S., Crystal, M., Fox, H., Ramshaw, L., Schwartz, R., Stone, R., Weischedel, R., The Annotation Group: BBN: Description of the SIFT System as Used for MUC-7. In: Proceedings of the Seventh Message Understanding Conference (MUC-7), Fairfax, Virginia (1998)
Google Scholar
Bikel, D.M., Schwartz, R., Weischedel, R.M.: An Algorithm that Learns What’s in a Name. In: Machine Learning (Special Issue on NLP) (1999)
Google Scholar
GuoDong, Z., Jain, S.: Named Entity Recognition Using a HMM-based Chunk Tagger. In: Proceedings of the fortieth Annual Meeting of the Association for Computational Linguistics (ACL 2002), Philadelphia (2002)
Google Scholar
Borthwick, A., Sterling, J., Agichtein, E., Grishman, R.: NYU: Description of the MENE Named Entity System as Used in MUC-7. In: Proceedings of the Seventh Message Understanding Conference (MUC-7). Fairfax, Virginia. (1998)
Google Scholar
Borthwick, A.: A Maximum Entropy Approach to Named Entity Recognition. Ph.D. Thesis. New York University (1999)
Google Scholar
Leong, C.H., Tou, N.H.: Named Entity Recognition: A Maximum Entropy Approach Using Global Information. In: Proceedings of the 19th International Conference on Computational Linguistics (COLING 2002), Taipei, pp. 190–196 (2002)
Google Scholar
Bennett, S.W., Aone, C., Lovell, C.: Learning to Tag Multilingual Texts Through Observation. In: Proceedings of the First Conference on Empirical Methods on Natural Language Processing (EMNLP 1996), Providence, Rhode Island (1996)
Google Scholar
Zhang, T., Johnson, D.: A Robust Risk Minimization based Named Entity Recognition System. In: Proceedings of CoNLL 2003, Edmonton, Canada, pp. 204–207 (2003)
Google Scholar
Klein, D., Smarr, J., Nguyen, H., Manning, C.D.: Named Entity Recognition with Character-Level Models. In: Proceedings of CoNLL 2003, Edmonton, Canada, pp. 180–183 (2003)
Google Scholar
McCallum, A., Li, W.: Early results for Named Entity Recognition with Conditional Random Fields, Feature Induction and Web-Enhanced Lexicons. In: Proceedings of CoNLL 2003, Edmonton, Canada, pp. 188–191 (2003)
Google Scholar
Viterbi, A.J.: Error Bounds for Convolutional Codes and an Asymptotically Optimum Decoding Algorithm. IEEE Transactions on Information Theory IT(13), 260–269 (1967)
Article Google Scholar
McCallum, A., Freitag, D., Pereira, F.: Maximum entropy Markov models for information extraction and segmentation. In: ICML-19, Stanford, California, pp. 591–598 (2000)
Google Scholar
Lafferty, J., McCallum, A., Pereira, F.: Conditional random fields: probabilistic models for segmenting and labeling sequence data. In: ICML-20 (2001)
Google Scholar
Chen, Goodman: An Empirical Study of Smoothing Technniques for Language Modeling. In: Proceedings of the 34th Annual Meeting of the Association of Computational Linguistics (ACL 1996), Santa Cruz, California, USA, pp. 310–318 (1996)
Google Scholar
Jelinek, F.: Self-Organized Language Modeling for Speech Recognition. In: Waibel, A., Lee, K.-F. (eds.) Readings in Speech Recognition, pp. 450–506. Morgan Kaufmann, San Francisco (1989)
Google Scholar
Katz, S.M.: Estimation of Probabilities from Sparse Data for the Language Model Component of a Speech Recognizer. IEEE Transactions on Acoustics, Speech and Signal Processing 35, 400–401 (1987)
Article Google Scholar
Collins, M., Brooks, J.: Prepositional Phrase Attachment through a Backed-Off Model. In: Proceedings of the Third Workshop on Very Large Corpora (1995)
Google Scholar
Roth, D., Zelenko, D.: Part of Speech Tagging Using a Network of Linear Separators. In: COLING-ACL 1998, Montreal, Canada, pp. 1136–1142 (1998)
Google Scholar

Download references

Author information

Authors and Affiliations

Institute for Infocomm Research, 21 Heng Mui Keng Terrace, 119613, Singapore
Guodong Zhou, Jian Su & Lingpeng Yang

Authors

Guodong Zhou
View author publications
You can also search for this author in PubMed Google Scholar
Jian Su
View author publications
You can also search for this author in PubMed Google Scholar
Lingpeng Yang
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

National Polytechnic Institute, Center for Computing Research, 07738, Mexico City, México
Alexander Gelbukh

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Zhou, G., Su, J., Yang, L. (2005). Resolution of Data Sparseness in Named Entity Recognition Using Hierarchical Features and Feature Relaxation Principle. In: Gelbukh, A. (eds) Computational Linguistics and Intelligent Text Processing. CICLing 2005. Lecture Notes in Computer Science, vol 3406. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-30586-6_84

Download citation

DOI: https://doi.org/10.1007/978-3-540-30586-6_84
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-24523-0
Online ISBN: 978-3-540-30586-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Resolution of Data Sparseness in Named Entity Recognition Using Hierarchical Features and Feature Relaxation Principle

Abstract

Access this chapter

Preview

Similar content being viewed by others

Does semantics aid syntax? An empirical study on named entity recognition and classification

Combining rule-based and statistical mechanisms for low-resource named entity recognition

The Five Generations of Entity Resolution on Web Data

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

Resolution of Data Sparseness in Named Entity Recognition Using Hierarchical Features and Feature Relaxation Principle

Abstract

Access this chapter

Preview

Similar content being viewed by others

Does semantics aid syntax? An empirical study on named entity recognition and classification

Combining rule-based and statistical mechanisms for low-resource named entity recognition

The Five Generations of Entity Resolution on Web Data

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation