An Automatic POS Tagger System for Code Mixed Indian Social Media Text

Basisth, Nihar Jyoti; Sachan, Tushar; Kumari, Neha; Pandey, Shyambabu; Pakray, Partha

doi:10.1007/978-3-031-48879-5_21

Nihar Jyoti Basisth⁹,
Tushar Sachan⁹,
Neha Kumari⁹,
Shyambabu Pandey⁹ &
…
Partha Pakray⁹

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 1956))

Included in the following conference series:

International Conference on Computational Intelligence in Communications and Business Analytics

60 Accesses

Abstract

For a range of Natural Language Processing (NLP) applications, including Sentiment Analysis, Sarcasm Detection, Information Retrieval, Question Answering, and Named Entity Identification, text derived from multiple users’ posts and what they comment on social media constitute significant information (IR). All such applications require part-of-speech (POS) tagging to add tag information to the raw text. Code-mixing, a social media user’s natural desire to submit content in multiple languages, presents a difficulty to POS tagging. In addition, sophisticated and freestyle writing increases the intricacy of the issue. For POS tagging of Code-Mixed Indian social media text, a supervised algorithm using Hidden Markov Model (HMM) with the Viterbi algorithm has been developed to address the problem. The suggested system has been trained and tested using publicly accessible social media text in Indian languages (ILs), particularly Bengali, Telugu, English, and Hindi. On the basis of the F-measure, the accuracy of the system-annotated tags have been assessed.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 64.99; Price excludes VAT (USA)

Softcover Book: USD 84.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
http://amitavadas.com/Code-Mixing.html.

References

Allahyari, M., et al.: Text summarization techniques: a brief survey. arXiv preprint arXiv:1707.02268 (2017)
Ambikairajah, E., Li, H., Wang, L., Yin, B., Sethu, V.: Language identification: a tutorial. IEEE Circuits Syst. Mag. 11(2), 82–108 (2011)
Article Google Scholar
Bandyopadhyay, S., Ekbal, A.: HMM based POS tagger and rule-based chunker for Bengali. In: Advances in Pattern Recognition, pp. 384–390. World Scientific (2007)
Google Scholar
Banko, M., Moore, R.C.: Part-of-speech tagging in context. In: COLING 2004: Proceedings of the 20th International Conference on Computational Linguistics, pp. 556–561 (2004)
Google Scholar
Biamonte, J., Wittek, P., Pancotti, N., Rebentrost, P., Wiebe, N., Lloyd, S.: Quantum machine learning. Nature 549(7671), 195–202 (2017)
Article Google Scholar
Bishwas, A.K., Mani, A., Palade, V.: Parts of speech tagging in NLP: runtime optimization with quantum formulation and ZX calculus. arXiv preprint arXiv:2007.10328 (2020)
Eddy, S.R.: Hidden Markov models. Curr. Opin. Struct. Biol. 6(3), 361–365 (1996)
Article Google Scholar
Ekbal, A., Mondal, S., Bandyopadhyay, S.: POS tagging using HMM and rule-based chunking. Proc. SPSAL 8(1), 25–28 (2007)
Google Scholar
Forney, G.D.: The Viterbi algorithm. Proc. IEEE 61(3), 268–278 (1973)
Article MathSciNet Google Scholar
Gadde, P., Yeleti, M.V.: Improving statistical POS tagging using linguistic feature for Hindi and Telugu. In: Proceedings of ICON (2008)
Google Scholar
Ghosh, S., Ghosh, S., Das, D.: Part-of-speech tagging of code-mixed social media text. In: Proceedings of the Second Workshop on Computational Approaches to Code Switching, pp. 90–97 (2016)
Google Scholar
Hasan, F.M., UzZaman, N., Khan, M.: Comparison of different POS tagging techniques (n-gram, HMM and Brill’s tagger) for Bangla. In: Elleithy, K. (ed.) Advances and Innovations in Systems, Computing Sciences and Software Engineering, pp. 121–126. Springer, Dordrecht (2007). https://doi.org/10.1007/978-1-4020-6264-3_23
Jamatia, A., Gambäck, B., Das, A.: Part-of-speech tagging for code-mixed English-Hindi Twitter and Facebook chat messages. In: Proceedings of the International Conference Recent Advances in Natural Language Processing, pp. 239–248 (2015)
Google Scholar
Lee, K., Palsetia, D., Narayanan, R., Patwary, M.M.A., Agrawal, A., Choudhary, A.: Twitter trending topic classification. In: 2011 IEEE 11th International Conference on Data Mining Workshops, pp. 251–258. IEEE (2011)
Google Scholar
Nadkarni, P.M., Ohno-Machado, L., Chapman, W.W.: Natural language processing: an introduction. J. Am. Med. Inform. Assoc. 18(5), 544–551 (2011)
Article Google Scholar
Naseem, T., Snyder, B., Eisenstein, J., Barzilay, R.: Multilingual part-of-speech tagging: two unsupervised approaches. J. Artif. Intell. Res. 36, 341–385 (2009)
Article MATH Google Scholar
Nave, M., Rita, P., Guerreiro, J.: A decision support system framework to track consumer sentiments in social media. J. Hospitality Market. Manag. 27(6), 693–710 (2018)
Article Google Scholar
Pakray, P., Majumder, G., Pathak, A.: An HMM based POS tagger for POS tagging of code-mixed Indian social media text. In: Mandal, J.K., Sinha, D. (eds.) CSI 2018. CCIS, vol. 836, pp. 495–504. Springer, Singapore (2018). https://doi.org/10.1007/978-981-13-1343-1_41
Chapter Google Scholar
Pandey, S., Dadure, P., Nunsanga, M.V., Pakray, P.: Parts of speech tagging towards classical to quantum computing. In: 2022 IEEE Silchar Subsection Conference (SILCON), pp. 1–6. IEEE (2022)
Google Scholar
Plank, B., Søgaard, A., Goldberg, Y.: Multilingual part-of-speech tagging with bidirectional long short-term memory models and auxiliary loss. arXiv preprint arXiv:1604.05529 (2016)
Shinghal, R., Toussaint, G.T.: Experiments in text recognition with the modified Viterbi algorithm. IEEE Trans. Pattern Anal. Mach. Intell. 2, 184–193 (1979)
Article MATH Google Scholar
Singh, K., Sen, I., Kumaraguru, P.: A Twitter corpus for Hindi-English code mixed POS tagging. In: Proceedings of the Sixth International Workshop on Natural Language Processing for Social Media, pp. 12–17. Association for Computational Linguistics, Melbourne, Australia (2018). https://doi.org/10.18653/v1/W18-3503. https://aclanthology.org/W18-3503
Sniedovich, M.: Dynamic Programming, vol. 297. CRC Press (1991)
Google Scholar
Srinivasan, S., Gordon, G., Boots, B.: Learning hidden quantum Markov models. In: International Conference on Artificial Intelligence and Statistics, pp. 1979–1987. PMLR (2018)
Google Scholar
Taylor, A., Marcus, M., Santorini, B.: The Penn treebank: an overview. In: Abeillé, A. (ed.) Treebanks, vol. 20, pp. 5–22. Springer, Dordrecht (2003). https://doi.org/10.1007/978-94-010-0201-1_1
Vyas, Y., Gella, S., Sharma, J., Bali, K., Choudhury, M.: POS tagging of English-Hindi code-mixed social media content. In: Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 974–979 (2014)
Google Scholar

Download references

Acknowledgement

The work presented here is a part of experiments being conducted under the Research Project Grant Ref. No. N-21/17/2020-NeGD supported by MeitY Quantum Computing Applications Lab (QCAL) and Amazon-braket. We also extend our gratitude to the Department of CSE, NIT Silchar, and the Center for Natural Language Processing for their support.

Author information

Authors and Affiliations

Department of Computer Science and Engineering, National Institute of Technology Silchar, Silchar, Assam, India
Nihar Jyoti Basisth, Tushar Sachan, Neha Kumari, Shyambabu Pandey & Partha Pakray

Authors

Nihar Jyoti Basisth
View author publications
You can also search for this author in PubMed Google Scholar
Tushar Sachan
View author publications
You can also search for this author in PubMed Google Scholar
Neha Kumari
View author publications
You can also search for this author in PubMed Google Scholar
Shyambabu Pandey
View author publications
You can also search for this author in PubMed Google Scholar
Partha Pakray
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding authors

Correspondence to Shyambabu Pandey or Partha Pakray .

Editor information

Editors and Affiliations

Kalyani Government Engineering College, Kalyani, India
Kousik Dasgupta
Assam University, Silchar, India
Somnath Mukhopadhyay
University of Kalyani, Kalyani, West Bengal, India
Jyotsna K. Mandal
Visvabharati University, Santiniketan, West Bengal, India
Paramartha Dutta

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Basisth, N.J., Sachan, T., Kumari, N., Pandey, S., Pakray, P. (2024). An Automatic POS Tagger System for Code Mixed Indian Social Media Text. In: Dasgupta, K., Mukhopadhyay, S., Mandal, J.K., Dutta, P. (eds) Computational Intelligence in Communications and Business Analytics. CICBA 2023. Communications in Computer and Information Science, vol 1956. Springer, Cham. https://doi.org/10.1007/978-3-031-48879-5_21

Download citation

DOI: https://doi.org/10.1007/978-3-031-48879-5_21
Published: 30 November 2023
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-48878-8
Online ISBN: 978-3-031-48879-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

An Automatic POS Tagger System for Code Mixed Indian Social Media Text