Skip to main content

An Automatic POS Tagger System for Code Mixed Indian Social Media Text

  • Conference paper
  • First Online:
Computational Intelligence in Communications and Business Analytics (CICBA 2023)

Abstract

For a range of Natural Language Processing (NLP) applications, including Sentiment Analysis, Sarcasm Detection, Information Retrieval, Question Answering, and Named Entity Identification, text derived from multiple users’ posts and what they comment on social media constitute significant information (IR). All such applications require part-of-speech (POS) tagging to add tag information to the raw text. Code-mixing, a social media user’s natural desire to submit content in multiple languages, presents a difficulty to POS tagging. In addition, sophisticated and freestyle writing increases the intricacy of the issue. For POS tagging of Code-Mixed Indian social media text, a supervised algorithm using Hidden Markov Model (HMM) with the Viterbi algorithm has been developed to address the problem. The suggested system has been trained and tested using publicly accessible social media text in Indian languages (ILs), particularly Bengali, Telugu, English, and Hindi. On the basis of the F-measure, the accuracy of the system-annotated tags have been assessed.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 64.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 84.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    http://amitavadas.com/Code-Mixing.html.

References

  1. Allahyari, M., et al.: Text summarization techniques: a brief survey. arXiv preprint arXiv:1707.02268 (2017)

  2. Ambikairajah, E., Li, H., Wang, L., Yin, B., Sethu, V.: Language identification: a tutorial. IEEE Circuits Syst. Mag. 11(2), 82–108 (2011)

    Article  Google Scholar 

  3. Bandyopadhyay, S., Ekbal, A.: HMM based POS tagger and rule-based chunker for Bengali. In: Advances in Pattern Recognition, pp. 384–390. World Scientific (2007)

    Google Scholar 

  4. Banko, M., Moore, R.C.: Part-of-speech tagging in context. In: COLING 2004: Proceedings of the 20th International Conference on Computational Linguistics, pp. 556–561 (2004)

    Google Scholar 

  5. Biamonte, J., Wittek, P., Pancotti, N., Rebentrost, P., Wiebe, N., Lloyd, S.: Quantum machine learning. Nature 549(7671), 195–202 (2017)

    Article  Google Scholar 

  6. Bishwas, A.K., Mani, A., Palade, V.: Parts of speech tagging in NLP: runtime optimization with quantum formulation and ZX calculus. arXiv preprint arXiv:2007.10328 (2020)

  7. Eddy, S.R.: Hidden Markov models. Curr. Opin. Struct. Biol. 6(3), 361–365 (1996)

    Article  Google Scholar 

  8. Ekbal, A., Mondal, S., Bandyopadhyay, S.: POS tagging using HMM and rule-based chunking. Proc. SPSAL 8(1), 25–28 (2007)

    Google Scholar 

  9. Forney, G.D.: The Viterbi algorithm. Proc. IEEE 61(3), 268–278 (1973)

    Article  MathSciNet  Google Scholar 

  10. Gadde, P., Yeleti, M.V.: Improving statistical POS tagging using linguistic feature for Hindi and Telugu. In: Proceedings of ICON (2008)

    Google Scholar 

  11. Ghosh, S., Ghosh, S., Das, D.: Part-of-speech tagging of code-mixed social media text. In: Proceedings of the Second Workshop on Computational Approaches to Code Switching, pp. 90–97 (2016)

    Google Scholar 

  12. Hasan, F.M., UzZaman, N., Khan, M.: Comparison of different POS tagging techniques (n-gram, HMM and Brill’s tagger) for Bangla. In: Elleithy, K. (ed.) Advances and Innovations in Systems, Computing Sciences and Software Engineering, pp. 121–126. Springer, Dordrecht (2007). https://doi.org/10.1007/978-1-4020-6264-3_23

  13. Jamatia, A., Gambäck, B., Das, A.: Part-of-speech tagging for code-mixed English-Hindi Twitter and Facebook chat messages. In: Proceedings of the International Conference Recent Advances in Natural Language Processing, pp. 239–248 (2015)

    Google Scholar 

  14. Lee, K., Palsetia, D., Narayanan, R., Patwary, M.M.A., Agrawal, A., Choudhary, A.: Twitter trending topic classification. In: 2011 IEEE 11th International Conference on Data Mining Workshops, pp. 251–258. IEEE (2011)

    Google Scholar 

  15. Nadkarni, P.M., Ohno-Machado, L., Chapman, W.W.: Natural language processing: an introduction. J. Am. Med. Inform. Assoc. 18(5), 544–551 (2011)

    Article  Google Scholar 

  16. Naseem, T., Snyder, B., Eisenstein, J., Barzilay, R.: Multilingual part-of-speech tagging: two unsupervised approaches. J. Artif. Intell. Res. 36, 341–385 (2009)

    Article  MATH  Google Scholar 

  17. Nave, M., Rita, P., Guerreiro, J.: A decision support system framework to track consumer sentiments in social media. J. Hospitality Market. Manag. 27(6), 693–710 (2018)

    Article  Google Scholar 

  18. Pakray, P., Majumder, G., Pathak, A.: An HMM based POS tagger for POS tagging of code-mixed Indian social media text. In: Mandal, J.K., Sinha, D. (eds.) CSI 2018. CCIS, vol. 836, pp. 495–504. Springer, Singapore (2018). https://doi.org/10.1007/978-981-13-1343-1_41

    Chapter  Google Scholar 

  19. Pandey, S., Dadure, P., Nunsanga, M.V., Pakray, P.: Parts of speech tagging towards classical to quantum computing. In: 2022 IEEE Silchar Subsection Conference (SILCON), pp. 1–6. IEEE (2022)

    Google Scholar 

  20. Plank, B., Søgaard, A., Goldberg, Y.: Multilingual part-of-speech tagging with bidirectional long short-term memory models and auxiliary loss. arXiv preprint arXiv:1604.05529 (2016)

  21. Shinghal, R., Toussaint, G.T.: Experiments in text recognition with the modified Viterbi algorithm. IEEE Trans. Pattern Anal. Mach. Intell. 2, 184–193 (1979)

    Article  MATH  Google Scholar 

  22. Singh, K., Sen, I., Kumaraguru, P.: A Twitter corpus for Hindi-English code mixed POS tagging. In: Proceedings of the Sixth International Workshop on Natural Language Processing for Social Media, pp. 12–17. Association for Computational Linguistics, Melbourne, Australia (2018). https://doi.org/10.18653/v1/W18-3503. https://aclanthology.org/W18-3503

  23. Sniedovich, M.: Dynamic Programming, vol. 297. CRC Press (1991)

    Google Scholar 

  24. Srinivasan, S., Gordon, G., Boots, B.: Learning hidden quantum Markov models. In: International Conference on Artificial Intelligence and Statistics, pp. 1979–1987. PMLR (2018)

    Google Scholar 

  25. Taylor, A., Marcus, M., Santorini, B.: The Penn treebank: an overview. In: Abeillé, A. (ed.) Treebanks, vol. 20, pp. 5–22. Springer, Dordrecht (2003). https://doi.org/10.1007/978-94-010-0201-1_1

  26. Vyas, Y., Gella, S., Sharma, J., Bali, K., Choudhury, M.: POS tagging of English-Hindi code-mixed social media content. In: Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 974–979 (2014)

    Google Scholar 

Download references

Acknowledgement

The work presented here is a part of experiments being conducted under the Research Project Grant Ref. No. N-21/17/2020-NeGD supported by MeitY Quantum Computing Applications Lab (QCAL) and Amazon-braket. We also extend our gratitude to the Department of CSE, NIT Silchar, and the Center for Natural Language Processing for their support.

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to Shyambabu Pandey or Partha Pakray .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2024 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Basisth, N.J., Sachan, T., Kumari, N., Pandey, S., Pakray, P. (2024). An Automatic POS Tagger System for Code Mixed Indian Social Media Text. In: Dasgupta, K., Mukhopadhyay, S., Mandal, J.K., Dutta, P. (eds) Computational Intelligence in Communications and Business Analytics. CICBA 2023. Communications in Computer and Information Science, vol 1956. Springer, Cham. https://doi.org/10.1007/978-3-031-48879-5_21

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-48879-5_21

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-48878-8

  • Online ISBN: 978-3-031-48879-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics