Skip to main content

Improving the Performance of a NER System by Post-processing, Context Patterns and Voting

  • Conference paper
Computer Processing of Oriental Languages. Language Technology for the Knowledge-based Economy (ICCPOL 2009)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 5459))

Included in the following conference series:

  • 848 Accesses

Abstract

This paper reports about the development of a Named Entity Recognition (NER) system in Bengali by combining the outputs of the two classifiers, namely Conditional Random Field (CRF) and Support Vector Machine (SVM). Lexical context patterns, which are generated from an unlabeled corpus of 10 million wordforms in an unsupervised way, have been used as the features of the classifiers in order to improve their performance. We have post-processed the models by considering the second best tag of CRF and class splitting technique of SVM in order to improve the performance. Finally, the classifiers are combined together into a final system using three weighted voting techniques. Experimental results show the effectiveness of the proposed approach with the overall average recall, precision, and f-score values of 91.33%, 88.19%, and 89.73%, respectively.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Bikel, Daniel, M., Schwartz, R., Weischedel, Ralph, M.: An Algorithm that Learns What’s in Name. Machine Learning (Special Issue on NLP), 1–20 (1999)

    Google Scholar 

  2. Bothwick, A.: A Maximum Entropy Approach to Named Entity Recognition. Ph.D. Thesis, New York University (1999)

    Google Scholar 

  3. Lafferty, J., McCallum, A., Pereira, F.: Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data. In: Proc. of 18th ICML, pp. 282–289 (2001)

    Google Scholar 

  4. Yamada, Hiroyasu, Kudo, T., Matsumoto, Y.: Japanese Named Entity Extraction using Support Vector Machine. Transactions of IPSJ 43(1), 44–53 (2003)

    Google Scholar 

  5. Wu, D., Ngai, G., Carpuat, M.: A Stacked, Voted, Stacked Model for Named Entity Recognition. In: Proceedings of CoNLL 2003 (2003)

    Google Scholar 

  6. Florian, R., Ittycheriah, A., Jing, H., Zhang, T.: Named Entity Recognition through Classifier Combination. In: Proceedings of CoNLL 2003 (2003)

    Google Scholar 

  7. Munro, R., Ler, D., Patrick, J.: Meta-learning Orthographic and Contextual Models for Language Independent Named Entity Recognition. In: Proceedings of CoNLL 2003 (2003)

    Google Scholar 

  8. Ekbal, A., Bandyopadhyay, S.: Lexical Pattern Learning from Corpus Data for Named Entity Recognition. In: Proc. of 5th ICON, India, pp. 123–128 (2007)

    Google Scholar 

  9. Ekbal, A., Naskar, S., Bandyopadhyay, S.: Named Entity Recognition and Transliteration in Bengali. Named Entities: Recognition, Classification and Use, Special Issue of Lingvisticae Investigationes Journal 30(1), 95–114 (2007)

    Google Scholar 

  10. Ekbal, A., Haque, R., Bandyopadhyay, S.: Named Entity Recognition in Bengali: A Conditional Random Field Approach. In: Proc. of IJCNLP 2008, India, pp. 589–594 (2008)

    Google Scholar 

  11. Ekbal, A., Bandyopadhyay, S.: Bengali Named Entity Recognition using Support Vector Machine. In: Proc. of NERSSEAL, IJCNLP 2008, India, pp. 51–58 (2008)

    Google Scholar 

  12. Li, W., McCallum, A.: Rapid Development of Hindi Named Entity Recognition Using Conditional Random Fields and Feature Inductions. ACM TALIP 2(3), 290–294 (2003)

    Article  Google Scholar 

  13. Cucerzan, S., Yarowsky, D.: Language Independent Named Entity Recognition Combining Morphological and Contextual Evidence. In: Proc. of the Joint SIGDAT Conference on EMNLP and VLC, pp. 90–99 (1999)

    Google Scholar 

  14. Saha, S., Sarkar, S., Mitra, P.: A Hybrid Feature Set based Maximum Entropy Hindi Named Entity Recognition. In: Proc. of IJCNLP 2008, India, pp. 343–349 (2008)

    Google Scholar 

  15. Kumar, N., Bhattacharyya, P.: Named Entity Recognition in Hindi using MEMM. Technical Report, IIT Bombay, India (2006)

    Google Scholar 

  16. Ekbal, A., Bandyopadhyay, S.: A Web-based Bengali News Corpus for Named Entity Recognition. Language Resources and Evaluation Journal 42(2), 173–182 (2008)

    Article  Google Scholar 

  17. Ekbal, A., Haque, R., Bandyopadhyay, S.: Bengali Part of Speech Tagging using Conditional Random Field. In: Proc. of SNLP, Thailand (2007)

    Google Scholar 

  18. Ekbal, A., Bandyopadhyay, S.: Web-based Bengali News Corpus for Lexicon Development and POS Tagging. Polibits Journal 37, 20–29 (2008)

    Google Scholar 

  19. Niu, C.g., Li, W., Ding, J., Srihari, R.: A Bootstrapping Approach to Named Entity Classification Using Sucessive Learners. In: Proc. of ACL 2003, pp. 335–342 (2003)

    Google Scholar 

  20. Erik, F., Sang, T.K.: Noun Phrase Recognition by System Combination. In: Proceedings of ANLP-NAACL 2000, pp. 50–55 (2000)

    Google Scholar 

  21. Erik, F., Sang, T.K.: Text Chunking by System Combination. In: Proceedings of CoNLL 2000 and LLL 2000, pp. 151–153 (2000)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2009 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Ekbal, A., Bandyopadhyay, S. (2009). Improving the Performance of a NER System by Post-processing, Context Patterns and Voting. In: Li, W., Mollá-Aliod, D. (eds) Computer Processing of Oriental Languages. Language Technology for the Knowledge-based Economy. ICCPOL 2009. Lecture Notes in Computer Science(), vol 5459. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-00831-3_5

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-00831-3_5

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-00830-6

  • Online ISBN: 978-3-642-00831-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics