Skip to main content

Named-Entity-Recognition (NER) for Tamil Language Using Margin-Infused Relaxed Algorithm (MIRA)

  • Conference paper
  • First Online:
Computational Linguistics and Intelligent Text Processing (CICLing 2016)

Abstract

Named-Entity-Recognition (NER) is widely used as a foundation for Natural Language Processing (NLP) applications. There have been few previous attempts on building generic NER systems for Tamil language. These attempts were based on machine-learning approaches such as Hidden Markov Models (HMM), Maximum Entropy Markov Models (MEMM), Support Vector Machine (SVM) and Conditional Random Fields (CRF). Among them, CRF has been proven to be the best with respect to the accuracy of NER in Tamil. This paper presents a novel approach to build a Tamil NER system using the Margin-Infused Relaxed Algorithm (MIRA). We also present a comparison of performance between MIRA and CRF algorithms for Tamil NER. When the gazetteer, POS tags and orthographic features are used with the MIRA algorithm, it attains an F1-measure of 81.38% on the Tamil BBC news data whereas the CRF algorithm shows only an F1-measure of 79.13% for the same set of features. Our NER system outperforms all the previous NER systems for Tamil language.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    NN - Noun, NNC - Compound Noun, RB - Adverb, VM - Verb Main, SYM - Symbol, PRP - Personal Pronoun, JJ - Adjective, NNP - Pronoun, PSP - Prepositions, QC - Quantity Count, VAUX - Verb Auxiliary, DEM - Determiners, QF - Quantifiers, NEG - Negatives, QO - Quantity Order, WQ - Word Question, INTF - Intensifier, NNPC - Compound Pro Noun.

References

  1. Nadeau, D., Sekine, S.: A survey of named entity recognition and classification. Lingvisticae Investig. 30(1), 3–26 (2007)

    Article  Google Scholar 

  2. Finkel, J.R., Grenager, T., Manning, C.: Incorporating non-local information into information extraction systems by gibbs sampling. In: Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics, pp. 363–370 (2005)

    Google Scholar 

  3. Malarkodi, C.S., Pattabhi, R.K., Sobha, L.D.: Tamil NER–coping with real time challenges. In: 24th International Conference on Computational Linguistics, pp. 23–38 (2012)

    Google Scholar 

  4. Laws, F., Schätze, H.: Stopping criteria for active learning of named entity recognition. In: Proceedings of the 22nd International Conference on Computational Linguistics-Volume 1, pp. 465–472 (2008)

    Google Scholar 

  5. Shen, D., Zhang, J., Su, J., Zhou, G., Tan, C.-L.: Multi-criteria-based active learning for named entity recognition. In: Proceedings of the 42nd Annual Meeting on Association for Computational Linguistics, p. 589 (2004)

    Google Scholar 

  6. Vijayakrishna, R., Sobha, L.: Domain focused named entity recognizer for tamil using conditional random fields. In: Proceedings of the IJCNLP-08 Workshop on NER for South and South East Asian Languages, pp. 59–66 (2008)

    Google Scholar 

  7. Pandian, S., Pavithra, K.A., Geetha, T.: Hybrid three-stage named entity recognizer for tamil. In: The Sixth Annual Conference on Informatics and Systems (INFOS), pp. 45–52 (2008)

    Google Scholar 

  8. Crammer, K., Singer, Y.: Ultraconservative online algorithms for multiclass problems. J. Mach. Learn. Res. 3, 951–991 (2003)

    MATH  Google Scholar 

  9. Banerjee, S., Naskar, S.K., Bandyopadhyay, S.: Bengali named entity recognition using margin infused relaxed algorithm. In: Sojka, P., Horák, A., Kopeček, I., Pala, K. (eds.) TSD 2014. LNCS (LNAI), vol. 8655, pp. 125–132. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-10816-2_16

    Google Scholar 

  10. Cunningham, H., Tablan, V., Roberts, A., Bontcheva, K.: Getting more out of biomedical documents with GATE’s full lifecycle open source text analytics. PLoS Comput. Biol. 9(2), e1002854 (2013)

    Article  Google Scholar 

  11. Bird, S.: NLTK: the natural language toolkit. In: Proceedings of the COLING/ACL on Interactive Presentation Sessions, pp. 69–72 (2006)

    Google Scholar 

  12. Ekbal, A., Haque, R., Das, A., Poka, V., Bandyopadhyay, S.: Language independent named entity recognition in indian languages. In: IJCNLP, pp. 33–40 (2008)

    Google Scholar 

  13. Mikheev, A., Moens, M., Grover, C.: Named entity recognition without gazetteers. In: Proceedings of the Ninth Conference on European Chapter of the Association for Computational Linguistics, pp. 1–8 (1999)

    Google Scholar 

  14. Manning, C., Surdeanu, M., Bauer, J., Finkel, J., Bethard, S., McClosky, D.: The stanford CoreNLP natural language processing toolkit. In: Proceedings of 52nd Annual Meeting of the Association for Computational Linguistics: System Demonstrations, pp. 55–60 (2014)

    Google Scholar 

  15. Toutanova, K., Klein, D., Manning, C.D., Singer, Y.: Feature-rich part-of-speech tagging with a cyclic dependency network. In: Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology-Volume 1, pp. 173–180 (2003)

    Google Scholar 

  16. Dhanalakshmi, V., Shivapratap, G., Soman Kp, R.S.: Tamil POS tagging using linear programming. Int. J. Recent Trends Eng. 1(2), 166–169 (2009)

    Google Scholar 

  17. Kudo, T.: CRF++: Yet another CRF toolkit, CRF++: Yet Another CRF toolkit (2005). https://taku910.github.io/crfpp/. Accessed 24 Jan 2016

  18. Crammer, K., McDonald, R., Pereira, F.: Scalable large-margin online learning for structured classification. In: NIPS Workshop on Learning With Structured Outputs (2005)

    Google Scholar 

  19. Krishnamurti, B.: The Dravidian Languages. Cambridge University Press, Cambridge (2003)

    Book  Google Scholar 

  20. Lafferty, J., McCallum, A., Pereira, F.C.: Conditional random fields: Probabilistic models for segmenting and labeling sequence data. In: ICML 2001 Proceedings of the Eighteenth International Conference on Machine Learning, pp. 282–289 (2001)

    Google Scholar 

Download references

Acknowledgement

We would like to thank AU-KBC research centre of Chennai, Forum for Information Retrieval Evaluation (FIRE) and Department of Registrations of Persons Sri Lanka for providing us necessary language resources and tools to carry out this research.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Megala Uthayakumar .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer International Publishing AG, part of Springer Nature

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Theivendiram, P. et al. (2018). Named-Entity-Recognition (NER) for Tamil Language Using Margin-Infused Relaxed Algorithm (MIRA). In: Gelbukh, A. (eds) Computational Linguistics and Intelligent Text Processing. CICLing 2016. Lecture Notes in Computer Science(), vol 9623. Springer, Cham. https://doi.org/10.1007/978-3-319-75477-2_33

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-75477-2_33

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-75476-5

  • Online ISBN: 978-3-319-75477-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics