Inflectional and Derivational Hybrid Stemmer for Sentiment Analysis: A Case Study with Marathi Tweets

Patil, Rupali S.; Kolhe, Satish R.

doi:10.1007/978-3-031-07005-1_23

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 1576))

Included in the following conference series:

International Conference on Recent Trends in Image Processing and Pattern Recognition

461 Accesses

Abstract

Sentiment Analysis (SA) is more complex in Marathi (Indian Language) than in European languages due to the inflectional words and phrases and comparatively limited resources. SA needs several preprocessing steps, viz., cleaning, stemming, and stopword removal, which reduces the feature set’s dimensionality. Stemming, a crucial preprocessing step, converts the inflected words to a root or stem without morphological analysis. The first-ever work presents the inflectional and derivational hybrid stemmer for Marathi Tweets, using a rule-based approach and a dictionary look-up. The 10000 sentences (103789 words) are extracted from the OSCAR (https://oscar-corpus.com/) Marathi corpus to design and develop the stemmer. It achieved an average accuracy of 89.37% for ten random samples of 1000 words each. The performance is investigated using Paice’s parameters, i.e., under-stemming and over-stemming errors, Index Compression Factor (ICF), and Mean number of words per signature (MWC). The benchmark dataset of 4,245 Marathi political tweets is constructed to validate the system, and the Marathi lexicons (adjective-adverb) are built from SentiWordNet, and Hindi SentiWordNet are used to classify the tweets as positive or negative. The various experiments are performed to measure the stemmer effect on lexicon-based sentence level SA for Marathi tweets. The outcomes evidence improved accuracy using stemmer from 67.52% to 74.49% with F-measure of 0.75. However, some adverse effects of stemming are observed in the result.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 69.99; Price excludes VAT (USA)

Softcover Book: USD 89.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Sentiment Analysis of Moroccan Tweets Using Text Mining

A Comparative Evaluation of Preprocessing Techniques for Short Texts in Spanish

Sentiment analysis of Hindi language text: a critical review

Article 11 November 2023

References

Twitter reveals its daily active user numbers for the first time (2021). https://www.washingtonpost.com/technology/2019/02/07/twitter-reveals-its-daily-active-user-numbers-first-time. Accessed 25 Feb 2021
Top websites ranking (2020). https://www.similarweb.com/top-websites. Accessed 7 July 2020
Marathi language. https://en.wikipedia.org/w/index.php?title=Marathi_language&oldid=1013782960. Accessed 7 July 2020
Singh, J., Gupta, V.: A novel unsupervised corpus-based stemming technique using lexicon and corpus statistics. Knowl. Based Syst. 180, 147–162 (2019)
Article Google Scholar
Majgaonker, M.M., Siddiqui, T.: Discovering suffixes: a case study for Marathi language. Int. J. Comput. Sci. Eng. 2(8), 2716–2720 (2010)
Google Scholar
Gupta, V.: Hindi rule-based stemmer for nouns. Int. J. Adv. Res. Comput. Sci. Softw. Eng. 4(1), 62–65 (2014)
Google Scholar
Sarkar, S., Bandyopadhyay, S.: Design of a rule-based stemmer for natural language text in Bengali. In: Proceedings of the IJCNLP - 08 Workshop on NLP for Less Privileged Languages, pp. 65–72 (2008)
Google Scholar
Das, S., Mitra, P.: A rule based approach of stemming for inflectional and derivational words in Bengali. In: IEEE Technology Symposium, pp. 134–136 (2011)
Google Scholar
Bhat, S.: Statistical stemming for Kannada. In: The 4th Workshop on South and Southeast Asian NLP, International Joint Conference on Natural Language Processing, pp. 25–33 (2013)
Google Scholar
Suba, K., Jiandani, D., Bhattacharyya, P.: Hybrid inflectional stemmer and rule-based derivational stemmer for Gujarati. In: Proceedings of the 2nd Workshop on South and Southeast Asian Natural Language Processing (WSSANLP), IJCNLP, pp. 1–8 (2011)
Google Scholar
Patel, P., Popat, K., Bhattacharyya, P.: Hybrid stemmer for Gujarati. In: Proceedings of the 1st Workshop on South and Southeast Asian Natural Processing (WSSANLP), the 23rd International Conference on Computational Linguistics (COLING), pp. 51–55 (2010)
Google Scholar
Mishra, U., Prakash, C.: MAULIK: an effective stemmer for Hindi language. Int. J. Comput. Sci. Eng. 4(5), 711–717 (2012)
Google Scholar
Saharia, N., Konwar, K.M., Sharma, U., Kalita, J.K.: An improved stemming approach using HMM for a highly inflectional language. In: Gelbukh, A. (ed.) Computational Linguistics and Intelligent Text Processing, CICLing 2013. LNCS, vol. 7816, pp. 164–173. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-37247-6_14
Meitei, S., Purkayastha, B.S., Devi, H.M.: Development of a Manipuri stemmer: a hybrid approach. In: 2015 International Symposium on Advanced Computing and Communication (ISACC), pp. 128–131 (2015)
Google Scholar
Al-Khafaji, H.K., Habeeb, A.T.: Efficient algorithms for preprocessing and stemming of tweets in a sentiment analysis system. IOSR J. Comput. Eng. 19(3), 44–50 (2017)
Article Google Scholar
Porter, J.M., John, L.: Economist and Social ScientistKaren Iversen Vaughn Chicago Economist and Social Scientist Karen Iversen Vaughn Chicago, 2nd edn., pp. xiv, 178. University of Chicago Press (1981)
Google Scholar
Lovins, J.B.: Development of a stemming algorithm. Mech. Transl. Computat. Linguist. 11, 21–31 (1968)
Google Scholar
Dawson, J.L.: Suffix removal for word conflation. Bull. Assoc. Lit. Linguist. Comput. 14(3), 33–46 (1974)
Google Scholar
Paice, C.D.: Another stemmer. ACM SIGIR Forum 24(3), 56–61 (1990)
Article Google Scholar
Ramanathan, A., Rao, D.D.: A lightweight stemmer for Hindi. In: Workshop on Computational Linguistics for South-Asian Languages, EACL, pp. 1–8 (2003)
Google Scholar
Majumder, P., Mitra, M., Parui, S.K., Kole, G., Mitra, P., Datta, K.: YASS: yet another suffix stripper. ACM Trans. Inf. Syst. 25, 18–38 (2007)
Article Google Scholar
Pandey, A.K., Siddiqui, T.J.: An unsupervised Hindi stemmer with heuristic improvements. In: Proceedings of the Second Workshop on Analytics for Noisy Unstructured Text Data, AND 2008, pp. 99–105 (2008)
Google Scholar
Goldsmith, J.A.: Unsupervised learning of the morphology of a natural language. Comput. Linguist. 27, 353–371 (2001)
Article MathSciNet Google Scholar
Prajitha, U., Sreejith, C., Reghuraj, P.C.: LALITHA: a light weight Malayalam stemmer using suffix stripping method. In: 2013 International Conference on Control Communication and Computing (ICCC), pp. 244–248 (2013)
Google Scholar
Pragisha, K., Reghuraj, P.C.: STHREE: stemmer for Malayalam using three pass algorithm. In: International Conference on Control Communication and Computing (ICCC) (2013)
Google Scholar
Gupta, V., Joshi, N., Mathur, I.: Design and development of rule based inflectional and derivational Urdu stemmer ‘Usal’. In: 2015 International Conference on Futuristic Trends on Computational Analysis and Knowledge Management (ABLAZE) (2015)
Google Scholar
Patil, H.B., Patil, A.S.: MarS: a rule-based stemmer for morphologically rich language Marathi. In: 2017 International Conference on Computer, Communications and Electronics (Comptelix) (2017). https://doi.org/10.1109/comptelix.2017.8004036
Patil, H.B., Mhaske, N.T., Patil, A.S.: Design and development of a dictionary based stemmer for Marathi language. In: Bhattacharyya, P., Sastry, H., Marriboyina, V., Sharma, R. (eds.) Smart and Innovative Trends in Next Generation Computing Technologies, vol. 827, pp. 769–777. Springer, Cham (2018). https://doi.org/10.1007/978-981-10-8657-1_60
Dolamic, L., Savoy, J.: Comparative study of indexing and search strategies for the Hindi, Marathi, and Bengali languages. ACM Trans. Asian Lang. Inf. Process. 9(3), 1–24 (2010)
Article Google Scholar
Patil, H.B., Patil, A.S.: A hybrid stemmer for the affix stacking language: Marathi. In: Iyer, B., Deshpande, P.S., Sharma, S.C., Shiurkar, U. (eds.) Computing in Engineering and Technology. AISC, vol. 1025, pp. 441–449. Springer, Singapore (2020). https://doi.org/10.1007/978-981-32-9515-5_42
Chapter Google Scholar
Marathi alphabet (2019). https://en.wikibooks.org/w/index.php?title=Marathi/Alphabet&oldid=3587734. Accessed 27 Nov 2019
Jurafsky, D., Martin, J.H.: Speech and Language Processing: An Introduction to Natural Language Processing, Computational Linguistics, and Speech Recognition. Prentice Hall, Englewood Cliffs (2006)
Google Scholar
Dabre, R., Ambekar, A., Bhattacharyya, P.: Morphological analyzer for affix stacking languages: a case study of Marathi. In: Proceedings of COLING 2012: Posters, COLING 2012, pp. 225–234 (2012)
Google Scholar
Damale, M.K.: Marathi Shastriy Vyaakarana. Deshmukh and Company, Pune (1970)
Google Scholar
Bhosale, G., Kembhavi, S., Amberkar, A., Mhatre, S., Popale, L., Bhattacharyya, P.: Processing of Kridanta (participle) in Marathi. In: Proceedings of ICON- 2011: 9th International Conference on Natural Language Processing. Macmillan Publishers (2011)
Google Scholar
Sharma, Y., Mangat, V., Kaur, M.: A practical approach to sentiment analysis of Hindi tweets. In: 1st International Conference on Next Generation Computing Technologies (NGCT) (2015)
Google Scholar
Frakes, W.B., Fox, C.J.: Strength and similarity of affix removal stemming algorithms. ACM SIGIR Forum 37(1), 26–30 (2003). https://doi.org/10.1145/945546.945548
Article Google Scholar
Paice, C.D.: An evaluation method for stemming algorithms. In: Croft, B.W., van Rijsbergen, C.J. (eds.) SIGIR ’94, pp. 42–50. Springer, Cham (1994). https://doi.org/10.1007/978-1-4471-2099-5_5
Pandey, A.K., Siddiqui, T.J.: Evaluating effect of stemming and stop-word removal on Hindi text retrieval. In: Tiwary, U.S., Siddiqui, T.J., Radhakrishna, M., Tiwari, M.D. (eds.) Proceedings of the First International Conference on Intelligent Human Computer Interaction, pp. 316–326. Springer, Cham (2009). https://doi.org/10.1007/978-81-8489-203-1_31
Patil, R.S., Kolhe, S.R.: Resource creation for sentiment analysis of under-resourced language: Marathi. In: Santosh, K.C., Gawali, B. (eds.) Recent Trends in Image Processing and Pattern Recognition, RTIP2R 2020. CCIS, vol. 1380, pp. 445–457. Springer, Singapore (2021). https://doi.org/10.1007/978-981-16-0507-9_37

Download references

Author information

Authors and Affiliations

School of Computer Sciences, Kavayitri Bahinabai Chaudhari North Maharashtra University, Jalgaon, 425001, Maharashtra, India
Rupali S. Patil & Satish R. Kolhe

Authors

Rupali S. Patil
View author publications
You can also search for this author in PubMed Google Scholar
Satish R. Kolhe
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Rupali S. Patil .

Editor information

Editors and Affiliations

University of South Dakota, Vermillion, SD, USA
KC Santosh
Central University of Karnataka, Gulbarga, India
Ravindra Hegadi
Indian Statistical Institute, Kolkata, India
Umapada Pal

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Patil, R.S., Kolhe, S.R. (2022). Inflectional and Derivational Hybrid Stemmer for Sentiment Analysis: A Case Study with Marathi Tweets. In: Santosh, K., Hegadi, R., Pal, U. (eds) Recent Trends in Image Processing and Pattern Recognition. RTIP2R 2021. Communications in Computer and Information Science, vol 1576. Springer, Cham. https://doi.org/10.1007/978-3-031-07005-1_23

Download citation

DOI: https://doi.org/10.1007/978-3-031-07005-1_23
Published: 22 May 2022
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-07004-4
Online ISBN: 978-3-031-07005-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Inflectional and Derivational Hybrid Stemmer for Sentiment Analysis: A Case Study with Marathi Tweets