Classification of Offensive Tweet in Marathi Language Using Machine Learning Models

Kumari, Archana; Garge, Archana; Raj, Priyanshu; Kumar, Gunjan; Singh, Jyoti Prakash; Alryalat, Mohammad

doi:10.1007/978-3-031-48876-4_20

Archana Kumari⁹,
Archana Garge⁹,
Priyanshu Raj⁹,
Gunjan Kumar⁹,
Jyoti Prakash Singh⁹ &
…
Mohammad Alryalat¹⁰

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 1955))

Included in the following conference series:

International Conference on Computational Intelligence in Communications and Business Analytics

91 Accesses

Abstract

Offensive language identification is essential to make social media a safe and clean place to share one’s view. In this work, a model is proposed to automatically classify offensive tweets into offensive and not offensive classes of low-resource language. Marathi is spoken in Western India. Marathi being a low-resource language, lacks a comprehensive list of stopwords and proper stammer. To fill this gap, we created a list of stopwords for stopword removal and a list of suffixes to identify the root word in the Marathi language. Two different methods, Label Vectorizer and term frequency-inverse document frequency (TF-IDF) Vectorizer, are used to extract features from the text and then these features are used with six different conventional machine learning classifiers to classify a Marathi tweet into offensive or non-offensive.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 64.99; Price excludes VAT (USA)

Softcover Book: USD 84.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Cyberbully: Aggressive Tweets, Bully and Bully Target Profiling from Multilingual Indian Tweets

Supervised Classifiers to Identify Hate Speech on English and Spanish Tweets

Cyber-Bullying Detection: A Comparative Analysis of Twitter Data

Notes

1.
https://hasocfire.github.io/hasoc/2022/dataset.html.

References

Athiwaratkun, B., Wilson, A.G., Anandkumar, A.: Probabilistic fasttext for multi-sense word embeddings. arXiv preprint arXiv:1806.02901 (2018)
Baruah, A., Das, K.A., Barbhuiya, F.A., Dey, K.: Iiitg-adbu@ hasoc-dravidian-codemix-fire2020: Offensive content detection in code-mixed Dravidian text. arXiv preprint arXiv:2107.14336 (2021)
Das, A., Wahi, J.S., Li, S.: Detecting hate speech in multi-modal memes. arXiv preprint arXiv:2012.14891 (2020)
Frakes, W.B., Baeza-Yates, R.: Information retrieval: data structures and algorithms. Prentice-Hall, Inc. (1992)
Google Scholar
Frakes, W.B., Fox, C.J.: Strength and similarity of affix removal stemming algorithms. In: ACM SIGIR Forum, vol. 37, pp. 26–30. ACM, New York(2003)
Google Scholar
Gaikwad, S., Ranasinghe, T., Zampieri, M., Homan, C.M.: Cross-lingual offensive language identification for low resource languages: The case of Marathi. arXiv preprint arXiv:2109.03552 (2021)
Gajbhiye, D., Deshpande, S., Ghante, P., Kale, A., Chaudhari, D.: Machine learning models for hate speech identification in Marathi language. In: Forum for Information Retrieval Evaluation (Working Notes)(FIRE), CEUR-WS. org (2021)
Google Scholar
Giri, V., et al.: Mtstemmer: a multilevel stemmer for effective word pre-processing in Marathi. Turkish J. Comput. Mathem. Educ. (TURCOMAT) 12(2), 1885–1894 (2021)
Article Google Scholar
Jogin, M., Madhulika, M., Divya, G., Meghana, R., Apoorva, S., et al.: Feature extraction using convolution neural networks (CNN) and deep learning. In: 2018 3rd IEEE International Conference on Recent Trends in Electronics, Information & Communication Technology (RTEICT), pp. 2319–2323. IEEE (2018)
Google Scholar
Kumar, G., Singh, J.P., Kumar, A.: A deep multi-modal neural network for the identification of hate speech from social media. In: Conference on e-Business, e-Services and e-Society, pp. 670–680. Springer (2021)
Google Scholar
Kumari, K., Singh, J.P.: Identification of cyberbullying on multi-modal social media posts using genetic algorithm. Trans. Emerging Telecommun. Technol. 32(2), e3907 (2021)
Article Google Scholar
Kumari, K., Singh, J.P., Dwivedi, Y.K., Rana, N.P.: Multi-modal aggression identification using convolutional neural network and binary particle swarm optimization. Futur. Gener. Comput. Syst. 118, 187–197 (2021)
Article Google Scholar
Kumari, K., Singh, J.P., Dwivedi, Y.K., Rana, N.P.: Towards cyberbullying-free social media in smart cities: a unified multi-modal approach. Soft. Comput. 24(15), 11059–11070 (2020)
Article Google Scholar
Kuyumcu, B., Aksakalli, C., Delil, S.: An automated new approach in fast text classification (fasttext) a case study for Turkish text classification without pre-processing. In: Proceedings of the 2019 3rd International Conference on Natural Language Processing and Information Retrieval, pp. 1–4 (2019)
Google Scholar
Pathak, V., Joshi, M., Joshi, P., Mundada, M., Joshi, T.: Kbcnmujal@ hasoc-dravidian-codemix-fire2020: using machine learning for detection of hate speech and offensive code-mixed social media text. arXiv preprint arXiv:2102.09866 (2021)
Patil, H.B., Pawar, B., Patil, A.S.: A comprehensive analysis of stemmers available for indic languages. Int. J. Nat. Lang. Comput 5(1), 45–55 (2016)
Article Google Scholar
Patil, R.S., Kolhe, S.R.: Inflectional and derivational hybrid stemmer for sentiment analysis: a case study with Marathi tweets. In: International Conference on Recent Trends in Image Processing and Pattern Recognition, pp. 263–279. Springer (2022). https://doi.org/10.1007/978-3-031-07005-1_23
Pennington, J., Socher, R., Manning, C.D.: Glove: global vectors for word representation. In: Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP), pp. 1532–1543 (2014)
Google Scholar
Prajitha, U., Sreejith, C., Raj, P.R.: Lalitha: a lightweight Malayalam stemmer using the suffix stripping method. In: 2013 International Conference on Control Communication and Computing (ICCC), pp. 244–248. IEEE (2013)
Google Scholar
Saharia, N., Konwar, K.M., Sharma, U., Kalita, J.K.: An improved stemming approach using HMM for a highly inflectional language. In: Gelbukh, A. (ed.) CICLing 2013. LNCS, vol. 7816, pp. 164–173. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-37247-6_14
Chapter Google Scholar
Saumya, S., Kumar, A., Singh, J.P.: Offensive language identification in Dravidian code mixed social media text. In: Proceedings of the First Workshop on Speech and Language Technologies for Dravidian Languages, pp. 36–45 (2021)
Google Scholar
Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014)
Sreelakshmi, K., Premjith, B., Soman, K.: Detection of hate speech text in Hindi-English code-mixed data. Proc. Comput. Sci. 171, 737–744 (2020)
Article Google Scholar
Swaminathan, S., Ganesan, H.K., Pandiyarajan, R.: Hrs-techie@ dravidian-codemix and hasoc-fire2020: sentiment analysis and hate speech identification using machine learning deep learning and ensemble models. In: FIRE (Working Notes), pp. 241–252 (2020)
Google Scholar
Velankar, A., Patil, H., Gore, A., Salunke, S., Joshi, R.: Hate and offensive speech detection in Hindi and Marathi. arXiv preprint arXiv:2110.12200 (2021)
Velankar, A., Patil, H., Gore, A., Salunke, S., Joshi, R.: L3cube-mahahate: a tweet-based Marathi hate speech detection dataset and BERT models. arXiv preprint arXiv:2203.13778 (2022)
Zhang, W.: Neural dependency parsing of low-resource languages: a case study on Marathi (2022)
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science and Engineering, National Institute of Technology Patna, Patna, Bihar, India
Archana Kumari, Archana Garge, Priyanshu Raj, Gunjan Kumar & Jyoti Prakash Singh
Al-Balqa Applied University, Salt, Jordan
Mohammad Alryalat

Authors

Archana Kumari
View author publications
You can also search for this author in PubMed Google Scholar
Archana Garge
View author publications
You can also search for this author in PubMed Google Scholar
Priyanshu Raj
View author publications
You can also search for this author in PubMed Google Scholar
Gunjan Kumar
View author publications
You can also search for this author in PubMed Google Scholar
Jyoti Prakash Singh
View author publications
You can also search for this author in PubMed Google Scholar
Mohammad Alryalat
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Gunjan Kumar .

Editor information

Editors and Affiliations

Kalyani Government Engineering College, Kalyani, India
Kousik Dasgupta
Assam University, Silchar, India
Somnath Mukhopadhyay
University of Kalyani, Kalyani, West Bengal, India
Jyotsna K. Mandal
Visvabharati University, Santiniketan, West Bengal, India
Paramartha Dutta

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Kumari, A., Garge, A., Raj, P., Kumar, G., Singh, J.P., Alryalat, M. (2024). Classification of Offensive Tweet in Marathi Language Using Machine Learning Models. In: Dasgupta, K., Mukhopadhyay, S., Mandal, J.K., Dutta, P. (eds) Computational Intelligence in Communications and Business Analytics. CICBA 2023. Communications in Computer and Information Science, vol 1955. Springer, Cham. https://doi.org/10.1007/978-3-031-48876-4_20

Download citation

DOI: https://doi.org/10.1007/978-3-031-48876-4_20
Published: 30 November 2023
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-48875-7
Online ISBN: 978-3-031-48876-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Classification of Offensive Tweet in Marathi Language Using Machine Learning Models

Abstract

Access this chapter

Similar content being viewed by others

Cyberbully: Aggressive Tweets, Bully and Bully Target Profiling from Multilingual Indian Tweets

Supervised Classifiers to Identify Hate Speech on English and Spanish Tweets

Cyber-Bullying Detection: A Comparative Analysis of Twitter Data

Notes

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

Classification of Offensive Tweet in Marathi Language Using Machine Learning Models

Abstract

Access this chapter

Similar content being viewed by others

Cyberbully: Aggressive Tweets, Bully and Bully Target Profiling from Multilingual Indian Tweets

Supervised Classifiers to Identify Hate Speech on English and Spanish Tweets

Cyber-Bullying Detection: A Comparative Analysis of Twitter Data

Notes

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation