Malayalam POS Tagger—A Comparison Using SVM and HMM

Usha, K.; Pandian, S. Lakshmana

doi:10.1007/978-981-15-5788-0_40

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 1176))

807 Accesses
1 Citations

Abstract

Many Parts Of Speech (POS) taggers for the Malayalam language has been implemented using Support Vector Machine (SVM), Memory-Based Language Processing (MBLP), Hidden Markov Model (HMM) and other similar techniques. The objective was to find an improved POS tagger for the Malayalam language. This work proposed a comparison of the Malayalam POS tagger using the SVM and Hidden Markov model (HMM). The tagset used was the popular Bureau of Indian Standard (BIS) tag set. A manually created data set which has around 52,000 words has been taken from various Malayalam news sites. The preprocessing steps that have done for news text are also mentioned. Then POS tagging has been done using SVM and HMM. As POS tagging requires the extraction of multiple class labels, a multi-class SVM is used. It also performs feature extraction, feature selection, and classification. The word sense disambiguation and misclassification of words are the two major issues identified in SVM. Hidden Markov Model predicts the hidden sequence based on maximum observation likelihood which reduces ambiguity and misclassification rate.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 169.00; Price excludes VAT (USA)

Softcover Book: USD 219.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Menga, J., Lin, H., Yu, Y.: A two-stage feature selection method for text categorization. Comput. Math. Appl. 62(7), 2793–2800 (2011)
Google Scholar
Wang, T.-Y., Chiang, H.-M.: Fuzzy support vector machine for multi-class text categorization. Inform. Process. Manag. 43, 914–929 (2007)
Google Scholar
Manjusha, K., Anand Kumar, M., Soman, K.P.: On developing handwritten character image database for Malayalam language script. Eng. Sci. Technol. Int. J. 22(2), 637–645 (2019)
Google Scholar
Karthik, S., Srikanta Murthy, K.: Deep belief network-based approach to recognize handwritten Kannada characters using distributed average of gradients. Cluster Comput. 22(2), 4673–4681 (2019)
Article Google Scholar
El-Sawy, A., Loey, M., Hazem, E.B.: Arabic handwritten characters recognition using convolutional neural network. WSEAS Trans. Comput. Res. 5, 11–19 (2017)
Google Scholar
Zhang, F., et al.: Construction site accident analysis using text mining and natural language processing techniques. Autom. Constr. 99, 238–248 (2019)
Article Google Scholar
Suresh, A., Jha, M.: Automated essay grading using natural language processing and support vector machine. Int. J. Comput. Technol. 5(2) (2018)
Google Scholar
Kumar, D., Josan, G.: Prediction of part of speech tags for Punjabi using support vector machines. Int. Arab. J. Inform. Technol. (IAJIT) 13(6) (2016)
Google Scholar
Fernando, S., et al.: Comprehensive part-of-speech tag set and SVM based POS tagger for Sinhala. In: Proceedings of the 6th Workshop on South and Southeast Asian Natural Language Processing (WSSANLP 2016) (2016)
Google Scholar
Ghosh, S., Ghosh, S., Das, D.: Part-of-speech tagging of code-mixed social media text. In: Proceedings of the Second Workshop on Computational Approaches to Code Switching (2016)
Google Scholar
Pandian, S.L., Geetha, T.V.: CRF models for Tamil part of speech tagging and chunking. In: International Conference on Computer Processing of Oriental Languages. Springer, Berlin, Heidelberg (2009)
Google Scholar
Avinesh, P.V.S., Karthik, G.: Part-of-speech tagging and chunking using conditional random fields and transformation-based learning. In: Shallow Parsing for South Asian Languages, vol. 21 (2007)
Google Scholar
Leopold, H., et al.: Using hidden Markov models for the accurate linguistic analysis of process model activity labels. Inform. Syst. 83, 30–39 (2019)
Article Google Scholar
Khorsheed, M.S.: Diacritizing Arabic text using a single hidden Markov model. IEEE Access 6, 36522–36529 (2018)
Article Google Scholar
van der Aa, H., et al.: Transforming unstructured natural language descriptions into measurable process performance indicators using hidden Markov models. Inform. Syst. 71, 27–39 (2017)
Google Scholar
Paul, A., Purkayastha, B.S., Sarkar, S.: Hidden Markov model based part of speech tagging for Nepali language. In: 2015 International Symposium on Advanced Computing and Communication (ISACC). IEEE (2015)
Google Scholar
Joshi, N., Darbari, H., Mathur, I.: HMM based POS tagger for Hindi. In: Proceeding of 2013 International Conference on Artificial Intelligence, Soft Computing (AISC-2013) (2013)
Google Scholar
van Gael, J., Vlachos, A., Ghahramani, Z.: The infinite HMM for unsupervised POS tagging. In: Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing, vol. 2. Association for Computational Linguistics (2009)
Google Scholar
Johnson, M.: Why doesn’t EM find good HMM POS-taggers? In: Proceedings of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (2007)
Google Scholar
Devadath, V.V., Dipti Misra, S.: Significance of an accurate sandhi splitter in shallow parsing of dravidian languages. In: Proceedings of the 54th Annual Meeting. Association for Computational Linguistics (2016)
Google Scholar
Yang, L., Li, C., Ding, Q., Li, L.: Combining lexical and semantic features for short text classification. Procedia Comput. Sci. 22, 78–86 (2013). 17th International Conference in Knowledge Based and Intelligent Information and Engineering Systems—KES 2013
Google Scholar
Neubig, G.: NLP Programming Tutorial 5—Part of Speech Tagging with Hidden Markov Models (2016)
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science and Engineering, Pondicherry Engineering College, Puducherry, 605014, India
K. Usha & S. Lakshmana Pandian

Authors

K. Usha
View author publications
You can also search for this author in PubMed Google Scholar
S. Lakshmana Pandian
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to K. Usha .

Editor information

Editors and Affiliations

Department of Electronics and Communication Engineering, Shri Ramswaroop Memorial Group of Professional Colleges (SRMGPC), Lucknow, Uttar Pradesh, India
Vikrant Bhateja
Department of Computer Science and Information Engineering, National Dong Hwa University, Hualien, Taiwan
Sheng-Lung Peng
School of Computer Engineering, Kalinga Institute of Industrial Technology (KIIT), Bhubaneswar, Odisha, India
Suresh Chandra Satapathy
Department of Informatics, University of Leicester, Leicester, UK
Yu-Dong Zhang

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Usha, K., Pandian, S.L. (2021). Malayalam POS Tagger—A Comparison Using SVM and HMM. In: Bhateja, V., Peng, SL., Satapathy, S.C., Zhang, YD. (eds) Evolution in Computational Intelligence. Advances in Intelligent Systems and Computing, vol 1176. Springer, Singapore. https://doi.org/10.1007/978-981-15-5788-0_40

Download citation

DOI: https://doi.org/10.1007/978-981-15-5788-0_40
Published: 09 September 2020
Publisher Name: Springer, Singapore
Print ISBN: 978-981-15-5787-3
Online ISBN: 978-981-15-5788-0
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)

Publish with us

Policies and ethics