Named-Entity-Recognition (NER) for Tamil Language Using Margin-Infused Relaxed Algorithm (MIRA)

Theivendiram, Pranavan; Uthayakumar, Megala; Nadarasamoorthy, Nilusija; Thayaparan, Mokanarangan; Jayasena, Sanath; Dias, Gihan; Ranathunga, Surangika

doi:10.1007/978-3-319-75477-2_33

Pranavan Theivendiram¹⁴,
Megala Uthayakumar¹⁴,
Nilusija Nadarasamoorthy¹⁴,
Mokanarangan Thayaparan¹⁴,
Sanath Jayasena¹⁴,
Gihan Dias¹⁴ &
…
Surangika Ranathunga ORCID: orcid.org/0000-0003-0701-0204¹⁴

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 9623))

Included in the following conference series:

International Conference on Intelligent Text Processing and Computational Linguistics

1427 Accesses

Abstract

Named-Entity-Recognition (NER) is widely used as a foundation for Natural Language Processing (NLP) applications. There have been few previous attempts on building generic NER systems for Tamil language. These attempts were based on machine-learning approaches such as Hidden Markov Models (HMM), Maximum Entropy Markov Models (MEMM), Support Vector Machine (SVM) and Conditional Random Fields (CRF). Among them, CRF has been proven to be the best with respect to the accuracy of NER in Tamil. This paper presents a novel approach to build a Tamil NER system using the Margin-Infused Relaxed Algorithm (MIRA). We also present a comparison of performance between MIRA and CRF algorithms for Tamil NER. When the gazetteer, POS tags and orthographic features are used with the MIRA algorithm, it attains an F1-measure of 81.38% on the Tamil BBC news data whereas the CRF algorithm shows only an F1-measure of 79.13% for the same set of features. Our NER system outperforms all the previous NER systems for Tamil language.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

HILNER: A Hindi Language Named Entity Recognition System Based on Hybrid Approach

Performance Evaluation of SVM-Based Amazighe Named Entity Recognition

Named Entity Recognizer for Konkani Text

Notes

1.
NN - Noun, NNC - Compound Noun, RB - Adverb, VM - Verb Main, SYM - Symbol, PRP - Personal Pronoun, JJ - Adjective, NNP - Pronoun, PSP - Prepositions, QC - Quantity Count, VAUX - Verb Auxiliary, DEM - Determiners, QF - Quantifiers, NEG - Negatives, QO - Quantity Order, WQ - Word Question, INTF - Intensifier, NNPC - Compound Pro Noun.

References

Nadeau, D., Sekine, S.: A survey of named entity recognition and classification. Lingvisticae Investig. 30(1), 3–26 (2007)
Article Google Scholar
Finkel, J.R., Grenager, T., Manning, C.: Incorporating non-local information into information extraction systems by gibbs sampling. In: Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics, pp. 363–370 (2005)
Google Scholar
Malarkodi, C.S., Pattabhi, R.K., Sobha, L.D.: Tamil NER–coping with real time challenges. In: 24th International Conference on Computational Linguistics, pp. 23–38 (2012)
Google Scholar
Laws, F., Schätze, H.: Stopping criteria for active learning of named entity recognition. In: Proceedings of the 22nd International Conference on Computational Linguistics-Volume 1, pp. 465–472 (2008)
Google Scholar
Shen, D., Zhang, J., Su, J., Zhou, G., Tan, C.-L.: Multi-criteria-based active learning for named entity recognition. In: Proceedings of the 42nd Annual Meeting on Association for Computational Linguistics, p. 589 (2004)
Google Scholar
Vijayakrishna, R., Sobha, L.: Domain focused named entity recognizer for tamil using conditional random fields. In: Proceedings of the IJCNLP-08 Workshop on NER for South and South East Asian Languages, pp. 59–66 (2008)
Google Scholar
Pandian, S., Pavithra, K.A., Geetha, T.: Hybrid three-stage named entity recognizer for tamil. In: The Sixth Annual Conference on Informatics and Systems (INFOS), pp. 45–52 (2008)
Google Scholar
Crammer, K., Singer, Y.: Ultraconservative online algorithms for multiclass problems. J. Mach. Learn. Res. 3, 951–991 (2003)
MATH Google Scholar
Banerjee, S., Naskar, S.K., Bandyopadhyay, S.: Bengali named entity recognition using margin infused relaxed algorithm. In: Sojka, P., Horák, A., Kopeček, I., Pala, K. (eds.) TSD 2014. LNCS (LNAI), vol. 8655, pp. 125–132. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-10816-2_16
Google Scholar
Cunningham, H., Tablan, V., Roberts, A., Bontcheva, K.: Getting more out of biomedical documents with GATE’s full lifecycle open source text analytics. PLoS Comput. Biol. 9(2), e1002854 (2013)
Article Google Scholar
Bird, S.: NLTK: the natural language toolkit. In: Proceedings of the COLING/ACL on Interactive Presentation Sessions, pp. 69–72 (2006)
Google Scholar
Ekbal, A., Haque, R., Das, A., Poka, V., Bandyopadhyay, S.: Language independent named entity recognition in indian languages. In: IJCNLP, pp. 33–40 (2008)
Google Scholar
Mikheev, A., Moens, M., Grover, C.: Named entity recognition without gazetteers. In: Proceedings of the Ninth Conference on European Chapter of the Association for Computational Linguistics, pp. 1–8 (1999)
Google Scholar
Manning, C., Surdeanu, M., Bauer, J., Finkel, J., Bethard, S., McClosky, D.: The stanford CoreNLP natural language processing toolkit. In: Proceedings of 52nd Annual Meeting of the Association for Computational Linguistics: System Demonstrations, pp. 55–60 (2014)
Google Scholar
Toutanova, K., Klein, D., Manning, C.D., Singer, Y.: Feature-rich part-of-speech tagging with a cyclic dependency network. In: Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology-Volume 1, pp. 173–180 (2003)
Google Scholar
Dhanalakshmi, V., Shivapratap, G., Soman Kp, R.S.: Tamil POS tagging using linear programming. Int. J. Recent Trends Eng. 1(2), 166–169 (2009)
Google Scholar
Kudo, T.: CRF++: Yet another CRF toolkit, CRF++: Yet Another CRF toolkit (2005). https://taku910.github.io/crfpp/. Accessed 24 Jan 2016
Crammer, K., McDonald, R., Pereira, F.: Scalable large-margin online learning for structured classification. In: NIPS Workshop on Learning With Structured Outputs (2005)
Google Scholar
Krishnamurti, B.: The Dravidian Languages. Cambridge University Press, Cambridge (2003)
Book Google Scholar
Lafferty, J., McCallum, A., Pereira, F.C.: Conditional random fields: Probabilistic models for segmenting and labeling sequence data. In: ICML 2001 Proceedings of the Eighteenth International Conference on Machine Learning, pp. 282–289 (2001)
Google Scholar

Download references

Acknowledgement

We would like to thank AU-KBC research centre of Chennai, Forum for Information Retrieval Evaluation (FIRE) and Department of Registrations of Persons Sri Lanka for providing us necessary language resources and tools to carry out this research.

Author information

Authors and Affiliations

Department of Computer Science Engineering, University of Moratuwa, Moratuwa, Sri Lanka
Pranavan Theivendiram, Megala Uthayakumar, Nilusija Nadarasamoorthy, Mokanarangan Thayaparan, Sanath Jayasena, Gihan Dias & Surangika Ranathunga

Authors

Pranavan Theivendiram
View author publications
You can also search for this author in PubMed Google Scholar
Megala Uthayakumar
View author publications
You can also search for this author in PubMed Google Scholar
Nilusija Nadarasamoorthy
View author publications
You can also search for this author in PubMed Google Scholar
Mokanarangan Thayaparan
View author publications
You can also search for this author in PubMed Google Scholar
Sanath Jayasena
View author publications
You can also search for this author in PubMed Google Scholar
Gihan Dias
View author publications
You can also search for this author in PubMed Google Scholar
Surangika Ranathunga
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Megala Uthayakumar .

Editor information

Editors and Affiliations

CIC, Instituto Politécnico Nacional, Mexico City, Mexico
Alexander Gelbukh

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Theivendiram, P. et al. (2018). Named-Entity-Recognition (NER) for Tamil Language Using Margin-Infused Relaxed Algorithm (MIRA). In: Gelbukh, A. (eds) Computational Linguistics and Intelligent Text Processing. CICLing 2016. Lecture Notes in Computer Science(), vol 9623. Springer, Cham. https://doi.org/10.1007/978-3-319-75477-2_33

Download citation

DOI: https://doi.org/10.1007/978-3-319-75477-2_33
Published: 21 March 2018
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-75476-5
Online ISBN: 978-3-319-75477-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Named-Entity-Recognition (NER) for Tamil Language Using Margin-Infused Relaxed Algorithm (MIRA)

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

HILNER: A Hindi Language Named Entity Recognition System Based on Hybrid Approach

Performance Evaluation of SVM-Based Amazighe Named Entity Recognition

Named Entity Recognizer for Konkani Text

Notes

References

Acknowledgement

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

Named-Entity-Recognition (NER) for Tamil Language Using Margin-Infused Relaxed Algorithm (MIRA)

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

HILNER: A Hindi Language Named Entity Recognition System Based on Hybrid Approach

Performance Evaluation of SVM-Based Amazighe Named Entity Recognition

Named Entity Recognizer for Konkani Text

Notes

References

Acknowledgement

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation