Abstract
Sentiment Analysis (SA) is more complex in Marathi (Indian Language) than in European languages due to the inflectional words and phrases and comparatively limited resources. SA needs several preprocessing steps, viz., cleaning, stemming, and stopword removal, which reduces the feature set’s dimensionality. Stemming, a crucial preprocessing step, converts the inflected words to a root or stem without morphological analysis. The first-ever work presents the inflectional and derivational hybrid stemmer for Marathi Tweets, using a rule-based approach and a dictionary look-up. The 10000 sentences (103789 words) are extracted from the OSCAR (https://oscar-corpus.com/) Marathi corpus to design and develop the stemmer. It achieved an average accuracy of 89.37% for ten random samples of 1000 words each. The performance is investigated using Paice’s parameters, i.e., under-stemming and over-stemming errors, Index Compression Factor (ICF), and Mean number of words per signature (MWC). The benchmark dataset of 4,245 Marathi political tweets is constructed to validate the system, and the Marathi lexicons (adjective-adverb) are built from SentiWordNet, and Hindi SentiWordNet are used to classify the tweets as positive or negative. The various experiments are performed to measure the stemmer effect on lexicon-based sentence level SA for Marathi tweets. The outcomes evidence improved accuracy using stemmer from 67.52% to 74.49% with F-measure of 0.75. However, some adverse effects of stemming are observed in the result.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Twitter reveals its daily active user numbers for the first time (2021). https://www.washingtonpost.com/technology/2019/02/07/twitter-reveals-its-daily-active-user-numbers-first-time. Accessed 25 Feb 2021
Top websites ranking (2020). https://www.similarweb.com/top-websites. Accessed 7 July 2020
Marathi language. https://en.wikipedia.org/w/index.php?title=Marathi_language&oldid=1013782960. Accessed 7 July 2020
Singh, J., Gupta, V.: A novel unsupervised corpus-based stemming technique using lexicon and corpus statistics. Knowl. Based Syst. 180, 147–162 (2019)
Majgaonker, M.M., Siddiqui, T.: Discovering suffixes: a case study for Marathi language. Int. J. Comput. Sci. Eng. 2(8), 2716–2720 (2010)
Gupta, V.: Hindi rule-based stemmer for nouns. Int. J. Adv. Res. Comput. Sci. Softw. Eng. 4(1), 62–65 (2014)
Sarkar, S., Bandyopadhyay, S.: Design of a rule-based stemmer for natural language text in Bengali. In: Proceedings of the IJCNLP - 08 Workshop on NLP for Less Privileged Languages, pp. 65–72 (2008)
Das, S., Mitra, P.: A rule based approach of stemming for inflectional and derivational words in Bengali. In: IEEE Technology Symposium, pp. 134–136 (2011)
Bhat, S.: Statistical stemming for Kannada. In: The 4th Workshop on South and Southeast Asian NLP, International Joint Conference on Natural Language Processing, pp. 25–33 (2013)
Suba, K., Jiandani, D., Bhattacharyya, P.: Hybrid inflectional stemmer and rule-based derivational stemmer for Gujarati. In: Proceedings of the 2nd Workshop on South and Southeast Asian Natural Language Processing (WSSANLP), IJCNLP, pp. 1–8 (2011)
Patel, P., Popat, K., Bhattacharyya, P.: Hybrid stemmer for Gujarati. In: Proceedings of the 1st Workshop on South and Southeast Asian Natural Processing (WSSANLP), the 23rd International Conference on Computational Linguistics (COLING), pp. 51–55 (2010)
Mishra, U., Prakash, C.: MAULIK: an effective stemmer for Hindi language. Int. J. Comput. Sci. Eng. 4(5), 711–717 (2012)
Saharia, N., Konwar, K.M., Sharma, U., Kalita, J.K.: An improved stemming approach using HMM for a highly inflectional language. In: Gelbukh, A. (ed.) Computational Linguistics and Intelligent Text Processing, CICLing 2013. LNCS, vol. 7816, pp. 164–173. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-37247-6_14
Meitei, S., Purkayastha, B.S., Devi, H.M.: Development of a Manipuri stemmer: a hybrid approach. In: 2015 International Symposium on Advanced Computing and Communication (ISACC), pp. 128–131 (2015)
Al-Khafaji, H.K., Habeeb, A.T.: Efficient algorithms for preprocessing and stemming of tweets in a sentiment analysis system. IOSR J. Comput. Eng. 19(3), 44–50 (2017)
Porter, J.M., John, L.: Economist and Social ScientistKaren Iversen Vaughn Chicago Economist and Social Scientist Karen Iversen Vaughn Chicago, 2nd edn., pp. xiv, 178. University of Chicago Press (1981)
Lovins, J.B.: Development of a stemming algorithm. Mech. Transl. Computat. Linguist. 11, 21–31 (1968)
Dawson, J.L.: Suffix removal for word conflation. Bull. Assoc. Lit. Linguist. Comput. 14(3), 33–46 (1974)
Paice, C.D.: Another stemmer. ACM SIGIR Forum 24(3), 56–61 (1990)
Ramanathan, A., Rao, D.D.: A lightweight stemmer for Hindi. In: Workshop on Computational Linguistics for South-Asian Languages, EACL, pp. 1–8 (2003)
Majumder, P., Mitra, M., Parui, S.K., Kole, G., Mitra, P., Datta, K.: YASS: yet another suffix stripper. ACM Trans. Inf. Syst. 25, 18–38 (2007)
Pandey, A.K., Siddiqui, T.J.: An unsupervised Hindi stemmer with heuristic improvements. In: Proceedings of the Second Workshop on Analytics for Noisy Unstructured Text Data, AND 2008, pp. 99–105 (2008)
Goldsmith, J.A.: Unsupervised learning of the morphology of a natural language. Comput. Linguist. 27, 353–371 (2001)
Prajitha, U., Sreejith, C., Reghuraj, P.C.: LALITHA: a light weight Malayalam stemmer using suffix stripping method. In: 2013 International Conference on Control Communication and Computing (ICCC), pp. 244–248 (2013)
Pragisha, K., Reghuraj, P.C.: STHREE: stemmer for Malayalam using three pass algorithm. In: International Conference on Control Communication and Computing (ICCC) (2013)
Gupta, V., Joshi, N., Mathur, I.: Design and development of rule based inflectional and derivational Urdu stemmer ‘Usal’. In: 2015 International Conference on Futuristic Trends on Computational Analysis and Knowledge Management (ABLAZE) (2015)
Patil, H.B., Patil, A.S.: MarS: a rule-based stemmer for morphologically rich language Marathi. In: 2017 International Conference on Computer, Communications and Electronics (Comptelix) (2017). https://doi.org/10.1109/comptelix.2017.8004036
Patil, H.B., Mhaske, N.T., Patil, A.S.: Design and development of a dictionary based stemmer for Marathi language. In: Bhattacharyya, P., Sastry, H., Marriboyina, V., Sharma, R. (eds.) Smart and Innovative Trends in Next Generation Computing Technologies, vol. 827, pp. 769–777. Springer, Cham (2018). https://doi.org/10.1007/978-981-10-8657-1_60
Dolamic, L., Savoy, J.: Comparative study of indexing and search strategies for the Hindi, Marathi, and Bengali languages. ACM Trans. Asian Lang. Inf. Process. 9(3), 1–24 (2010)
Patil, H.B., Patil, A.S.: A hybrid stemmer for the affix stacking language: Marathi. In: Iyer, B., Deshpande, P.S., Sharma, S.C., Shiurkar, U. (eds.) Computing in Engineering and Technology. AISC, vol. 1025, pp. 441–449. Springer, Singapore (2020). https://doi.org/10.1007/978-981-32-9515-5_42
Marathi alphabet (2019). https://en.wikibooks.org/w/index.php?title=Marathi/Alphabet&oldid=3587734. Accessed 27 Nov 2019
Jurafsky, D., Martin, J.H.: Speech and Language Processing: An Introduction to Natural Language Processing, Computational Linguistics, and Speech Recognition. Prentice Hall, Englewood Cliffs (2006)
Dabre, R., Ambekar, A., Bhattacharyya, P.: Morphological analyzer for affix stacking languages: a case study of Marathi. In: Proceedings of COLING 2012: Posters, COLING 2012, pp. 225–234 (2012)
Damale, M.K.: Marathi Shastriy Vyaakarana. Deshmukh and Company, Pune (1970)
Bhosale, G., Kembhavi, S., Amberkar, A., Mhatre, S., Popale, L., Bhattacharyya, P.: Processing of Kridanta (participle) in Marathi. In: Proceedings of ICON- 2011: 9th International Conference on Natural Language Processing. Macmillan Publishers (2011)
Sharma, Y., Mangat, V., Kaur, M.: A practical approach to sentiment analysis of Hindi tweets. In: 1st International Conference on Next Generation Computing Technologies (NGCT) (2015)
Frakes, W.B., Fox, C.J.: Strength and similarity of affix removal stemming algorithms. ACM SIGIR Forum 37(1), 26–30 (2003). https://doi.org/10.1145/945546.945548
Paice, C.D.: An evaluation method for stemming algorithms. In: Croft, B.W., van Rijsbergen, C.J. (eds.) SIGIR ’94, pp. 42–50. Springer, Cham (1994). https://doi.org/10.1007/978-1-4471-2099-5_5
Pandey, A.K., Siddiqui, T.J.: Evaluating effect of stemming and stop-word removal on Hindi text retrieval. In: Tiwary, U.S., Siddiqui, T.J., Radhakrishna, M., Tiwari, M.D. (eds.) Proceedings of the First International Conference on Intelligent Human Computer Interaction, pp. 316–326. Springer, Cham (2009). https://doi.org/10.1007/978-81-8489-203-1_31
Patil, R.S., Kolhe, S.R.: Resource creation for sentiment analysis of under-resourced language: Marathi. In: Santosh, K.C., Gawali, B. (eds.) Recent Trends in Image Processing and Pattern Recognition, RTIP2R 2020. CCIS, vol. 1380, pp. 445–457. Springer, Singapore (2021). https://doi.org/10.1007/978-981-16-0507-9_37
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2022 Springer Nature Switzerland AG
About this paper
Cite this paper
Patil, R.S., Kolhe, S.R. (2022). Inflectional and Derivational Hybrid Stemmer for Sentiment Analysis: A Case Study with Marathi Tweets. In: Santosh, K., Hegadi, R., Pal, U. (eds) Recent Trends in Image Processing and Pattern Recognition. RTIP2R 2021. Communications in Computer and Information Science, vol 1576. Springer, Cham. https://doi.org/10.1007/978-3-031-07005-1_23
Download citation
DOI: https://doi.org/10.1007/978-3-031-07005-1_23
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-07004-4
Online ISBN: 978-3-031-07005-1
eBook Packages: Computer ScienceComputer Science (R0)