Skip to main content

Inflectional and Derivational Hybrid Stemmer for Sentiment Analysis: A Case Study with Marathi Tweets

  • Conference paper
  • First Online:
Recent Trends in Image Processing and Pattern Recognition (RTIP2R 2021)

Abstract

Sentiment Analysis (SA) is more complex in Marathi (Indian Language) than in European languages due to the inflectional words and phrases and comparatively limited resources. SA needs several preprocessing steps, viz., cleaning, stemming, and stopword removal, which reduces the feature set’s dimensionality. Stemming, a crucial preprocessing step, converts the inflected words to a root or stem without morphological analysis. The first-ever work presents the inflectional and derivational hybrid stemmer for Marathi Tweets, using a rule-based approach and a dictionary look-up. The 10000 sentences (103789 words) are extracted from the OSCAR (https://oscar-corpus.com/) Marathi corpus to design and develop the stemmer. It achieved an average accuracy of 89.37% for ten random samples of 1000 words each. The performance is investigated using Paice’s parameters, i.e., under-stemming and over-stemming errors, Index Compression Factor (ICF), and Mean number of words per signature (MWC). The benchmark dataset of 4,245 Marathi political tweets is constructed to validate the system, and the Marathi lexicons (adjective-adverb) are built from SentiWordNet, and Hindi SentiWordNet are used to classify the tweets as positive or negative. The various experiments are performed to measure the stemmer effect on lexicon-based sentence level SA for Marathi tweets. The outcomes evidence improved accuracy using stemmer from 67.52% to 74.49% with F-measure of 0.75. However, some adverse effects of stemming are observed in the result.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Twitter reveals its daily active user numbers for the first time (2021). https://www.washingtonpost.com/technology/2019/02/07/twitter-reveals-its-daily-active-user-numbers-first-time. Accessed 25 Feb 2021

  2. Top websites ranking (2020). https://www.similarweb.com/top-websites. Accessed 7 July 2020

  3. Marathi language. https://en.wikipedia.org/w/index.php?title=Marathi_language&oldid=1013782960. Accessed 7 July 2020

  4. Singh, J., Gupta, V.: A novel unsupervised corpus-based stemming technique using lexicon and corpus statistics. Knowl. Based Syst. 180, 147–162 (2019)

    Article  Google Scholar 

  5. Majgaonker, M.M., Siddiqui, T.: Discovering suffixes: a case study for Marathi language. Int. J. Comput. Sci. Eng. 2(8), 2716–2720 (2010)

    Google Scholar 

  6. Gupta, V.: Hindi rule-based stemmer for nouns. Int. J. Adv. Res. Comput. Sci. Softw. Eng. 4(1), 62–65 (2014)

    Google Scholar 

  7. Sarkar, S., Bandyopadhyay, S.: Design of a rule-based stemmer for natural language text in Bengali. In: Proceedings of the IJCNLP - 08 Workshop on NLP for Less Privileged Languages, pp. 65–72 (2008)

    Google Scholar 

  8. Das, S., Mitra, P.: A rule based approach of stemming for inflectional and derivational words in Bengali. In: IEEE Technology Symposium, pp. 134–136 (2011)

    Google Scholar 

  9. Bhat, S.: Statistical stemming for Kannada. In: The 4th Workshop on South and Southeast Asian NLP, International Joint Conference on Natural Language Processing, pp. 25–33 (2013)

    Google Scholar 

  10. Suba, K., Jiandani, D., Bhattacharyya, P.: Hybrid inflectional stemmer and rule-based derivational stemmer for Gujarati. In: Proceedings of the 2nd Workshop on South and Southeast Asian Natural Language Processing (WSSANLP), IJCNLP, pp. 1–8 (2011)

    Google Scholar 

  11. Patel, P., Popat, K., Bhattacharyya, P.: Hybrid stemmer for Gujarati. In: Proceedings of the 1st Workshop on South and Southeast Asian Natural Processing (WSSANLP), the 23rd International Conference on Computational Linguistics (COLING), pp. 51–55 (2010)

    Google Scholar 

  12. Mishra, U., Prakash, C.: MAULIK: an effective stemmer for Hindi language. Int. J. Comput. Sci. Eng. 4(5), 711–717 (2012)

    Google Scholar 

  13. Saharia, N., Konwar, K.M., Sharma, U., Kalita, J.K.: An improved stemming approach using HMM for a highly inflectional language. In: Gelbukh, A. (ed.) Computational Linguistics and Intelligent Text Processing, CICLing 2013. LNCS, vol. 7816, pp. 164–173. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-37247-6_14

  14. Meitei, S., Purkayastha, B.S., Devi, H.M.: Development of a Manipuri stemmer: a hybrid approach. In: 2015 International Symposium on Advanced Computing and Communication (ISACC), pp. 128–131 (2015)

    Google Scholar 

  15. Al-Khafaji, H.K., Habeeb, A.T.: Efficient algorithms for preprocessing and stemming of tweets in a sentiment analysis system. IOSR J. Comput. Eng. 19(3), 44–50 (2017)

    Article  Google Scholar 

  16. Porter, J.M., John, L.: Economist and Social ScientistKaren Iversen Vaughn Chicago Economist and Social Scientist Karen Iversen Vaughn Chicago, 2nd edn., pp. xiv, 178. University of Chicago Press (1981)

    Google Scholar 

  17. Lovins, J.B.: Development of a stemming algorithm. Mech. Transl. Computat. Linguist. 11, 21–31 (1968)

    Google Scholar 

  18. Dawson, J.L.: Suffix removal for word conflation. Bull. Assoc. Lit. Linguist. Comput. 14(3), 33–46 (1974)

    Google Scholar 

  19. Paice, C.D.: Another stemmer. ACM SIGIR Forum 24(3), 56–61 (1990)

    Article  Google Scholar 

  20. Ramanathan, A., Rao, D.D.: A lightweight stemmer for Hindi. In: Workshop on Computational Linguistics for South-Asian Languages, EACL, pp. 1–8 (2003)

    Google Scholar 

  21. Majumder, P., Mitra, M., Parui, S.K., Kole, G., Mitra, P., Datta, K.: YASS: yet another suffix stripper. ACM Trans. Inf. Syst. 25, 18–38 (2007)

    Article  Google Scholar 

  22. Pandey, A.K., Siddiqui, T.J.: An unsupervised Hindi stemmer with heuristic improvements. In: Proceedings of the Second Workshop on Analytics for Noisy Unstructured Text Data, AND 2008, pp. 99–105 (2008)

    Google Scholar 

  23. Goldsmith, J.A.: Unsupervised learning of the morphology of a natural language. Comput. Linguist. 27, 353–371 (2001)

    Article  MathSciNet  Google Scholar 

  24. Prajitha, U., Sreejith, C., Reghuraj, P.C.: LALITHA: a light weight Malayalam stemmer using suffix stripping method. In: 2013 International Conference on Control Communication and Computing (ICCC), pp. 244–248 (2013)

    Google Scholar 

  25. Pragisha, K., Reghuraj, P.C.: STHREE: stemmer for Malayalam using three pass algorithm. In: International Conference on Control Communication and Computing (ICCC) (2013)

    Google Scholar 

  26. Gupta, V., Joshi, N., Mathur, I.: Design and development of rule based inflectional and derivational Urdu stemmer ‘Usal’. In: 2015 International Conference on Futuristic Trends on Computational Analysis and Knowledge Management (ABLAZE) (2015)

    Google Scholar 

  27. Patil, H.B., Patil, A.S.: MarS: a rule-based stemmer for morphologically rich language Marathi. In: 2017 International Conference on Computer, Communications and Electronics (Comptelix) (2017). https://doi.org/10.1109/comptelix.2017.8004036

  28. Patil, H.B., Mhaske, N.T., Patil, A.S.: Design and development of a dictionary based stemmer for Marathi language. In: Bhattacharyya, P., Sastry, H., Marriboyina, V., Sharma, R. (eds.) Smart and Innovative Trends in Next Generation Computing Technologies, vol. 827, pp. 769–777. Springer, Cham (2018). https://doi.org/10.1007/978-981-10-8657-1_60

  29. Dolamic, L., Savoy, J.: Comparative study of indexing and search strategies for the Hindi, Marathi, and Bengali languages. ACM Trans. Asian Lang. Inf. Process. 9(3), 1–24 (2010)

    Article  Google Scholar 

  30. Patil, H.B., Patil, A.S.: A hybrid stemmer for the affix stacking language: Marathi. In: Iyer, B., Deshpande, P.S., Sharma, S.C., Shiurkar, U. (eds.) Computing in Engineering and Technology. AISC, vol. 1025, pp. 441–449. Springer, Singapore (2020). https://doi.org/10.1007/978-981-32-9515-5_42

    Chapter  Google Scholar 

  31. Marathi alphabet (2019). https://en.wikibooks.org/w/index.php?title=Marathi/Alphabet&oldid=3587734. Accessed 27 Nov 2019

  32. Jurafsky, D., Martin, J.H.: Speech and Language Processing: An Introduction to Natural Language Processing, Computational Linguistics, and Speech Recognition. Prentice Hall, Englewood Cliffs (2006)

    Google Scholar 

  33. Dabre, R., Ambekar, A., Bhattacharyya, P.: Morphological analyzer for affix stacking languages: a case study of Marathi. In: Proceedings of COLING 2012: Posters, COLING 2012, pp. 225–234 (2012)

    Google Scholar 

  34. Damale, M.K.: Marathi Shastriy Vyaakarana. Deshmukh and Company, Pune (1970)

    Google Scholar 

  35. Bhosale, G., Kembhavi, S., Amberkar, A., Mhatre, S., Popale, L., Bhattacharyya, P.: Processing of Kridanta (participle) in Marathi. In: Proceedings of ICON- 2011: 9th International Conference on Natural Language Processing. Macmillan Publishers (2011)

    Google Scholar 

  36. Sharma, Y., Mangat, V., Kaur, M.: A practical approach to sentiment analysis of Hindi tweets. In: 1st International Conference on Next Generation Computing Technologies (NGCT) (2015)

    Google Scholar 

  37. Frakes, W.B., Fox, C.J.: Strength and similarity of affix removal stemming algorithms. ACM SIGIR Forum 37(1), 26–30 (2003). https://doi.org/10.1145/945546.945548

    Article  Google Scholar 

  38. Paice, C.D.: An evaluation method for stemming algorithms. In: Croft, B.W., van Rijsbergen, C.J. (eds.) SIGIR ’94, pp. 42–50. Springer, Cham (1994). https://doi.org/10.1007/978-1-4471-2099-5_5

  39. Pandey, A.K., Siddiqui, T.J.: Evaluating effect of stemming and stop-word removal on Hindi text retrieval. In: Tiwary, U.S., Siddiqui, T.J., Radhakrishna, M., Tiwari, M.D. (eds.) Proceedings of the First International Conference on Intelligent Human Computer Interaction, pp. 316–326. Springer, Cham (2009). https://doi.org/10.1007/978-81-8489-203-1_31

  40. Patil, R.S., Kolhe, S.R.: Resource creation for sentiment analysis of under-resourced language: Marathi. In: Santosh, K.C., Gawali, B. (eds.) Recent Trends in Image Processing and Pattern Recognition, RTIP2R 2020. CCIS, vol. 1380, pp. 445–457. Springer, Singapore (2021). https://doi.org/10.1007/978-981-16-0507-9_37

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Rupali S. Patil .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2022 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Patil, R.S., Kolhe, S.R. (2022). Inflectional and Derivational Hybrid Stemmer for Sentiment Analysis: A Case Study with Marathi Tweets. In: Santosh, K., Hegadi, R., Pal, U. (eds) Recent Trends in Image Processing and Pattern Recognition. RTIP2R 2021. Communications in Computer and Information Science, vol 1576. Springer, Cham. https://doi.org/10.1007/978-3-031-07005-1_23

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-07005-1_23

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-07004-4

  • Online ISBN: 978-3-031-07005-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics