Abstract
Social media websites such as Twitter have become so indispensable today that people use them almost on a daily basis for sharing their emotions, opinions, suggestions and thoughts. Motivated by such behavioral tendencies, the purpose of this study is to define an approach to automatically classify the tweets on Twitter data into two main classes, namely, hate speech and non-hate speech. This provides a valuable source of information in analyzing and understanding target audiences and spotting marketing trends. We thus propose HiSAT, a Hierarchical framework for Sentiment Analysis on Twitter data. Sentiments/opinions in tweets are highly unstructured-and do not have a proper defined sequence. They constitute a heterogeneous data from many sources having different formats, and express either positive or negative, or neutral sentiment. Hence, in HiSAT we conduct Natural Language Processing encompassing tokenization, stemming and lemmatization techniques that convert text to tokens; as well as Bag-of-Words (BoW) and Term Frequency-Inverse Document Frequency (TF-IDF) techniques that convert text sentences into numeric vectors. These are then fed as inputs to Machine learning algorithms within the HiSAT framework; more specifically, Random Forest, Logistic Regression and Naïve Bayes are used as text-binary classifiers to detect hate speech and non-hate speech from the tweets. Results of experiments performed with the HiSAT framework show that Random Forest outperforms the others with a better prediction in estimating the correct labels (with accuracy above the 95% range). We present the HiSAT approach, its implementation and experiments, along with related work and ongoing research.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Liu, B.: Sentiment Analysis: Mining Opinions, Sentiments, and Emotions, 2nd edn. Cambridge University Press (2020)
Simon Perfect (Theos, London, UK), What are the hate crime laws and should they be reformed? November 2020. https://www.theosthinktank.co.uk/comment/2020/10/29/what-are-the-hate-crime-laws-and-should-they-be-reformed
Twitter Sentiment Analysis. https://www.kaggle.com/arkhoshghalb/twitter-sentiment-analysis-hatred-speech
Anjaria, M., Guddeti, R.M.R.: Influence factor based opinion mining of Twitter data using supervised learning, In: 2014 Sixth International Conference on Communication Systems and Networks (COMSNETS), pp. 1–8 (2014)
Cristianini, N., Ricci, E.: Support Vector Machines. In: Kao, M.Y. (eds.) Encyclopedia of Algorithms. Springer, Boston (2008)
Cao, H., Verma, R., Nenkova, A.: Speaker-sensitive emotion recognition via ranking: studies on acted and spontaneous speech. In: Comput. Speech Lang. 28(1), 186–202 (2015)
Xia, R., Zong, C., Li, S.: Ensemble of feature sets and classification algorithms for sentiment classification. Inf. Sci. 181(6), 1138–1152 (2011)
Chen, L., Mao, X., Xue, Y., Cheng, L.L.: Speech emotion recognition: features and classification models. Digit. Signal Process. 22(6), 1154–1160 (2012)
Du, X., Emebo, O., Varde, A.S., Tandon, N., Chowdhury, S.N., Weikum, G.: Air quality assessment from social media and structured data: pollutants and health impacts in urban planning. In: IEEE International Conference on Data Engineering (ICDE) Workshops, pp. 54–59 (2016)
El Ayadi, M., Kamel, M.S., Karray, F.: Survey on speech emotion recognition: Features, classification schemes, and databases. Pattern Recogn. 44, 572–587 (2011)
Gandhe, K., Varde, A.S., Du, X.: Sentiment analysis of Twitter data with hybrid learning for recommender applications. In: IEEE Ubiquitous Computing, Electronics and Mobile Communication Conference (UEMCON), pp. 57–63 (2018)
Saif, H., He, Y., Alani, H.: Semantic sentiment analysis of Twitter. In: Cudré-Mauroux, P., Heflin, J., Sirin, E., Tudorache, T., Euzenat, J., Hauswirth, M., Parreira, J.X., Hendler, J., Schreiber, G., Bernstein, A., Blomqvist, E. (eds.) ISWC 2012. LNCS, vol. 7649, pp. 508–524. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-35176-1_32
Badjatiya, P., Gupta, S., Gupta, M., Varma, V.: Deep learning for hate speech detection in tweets. In: The 26th International Conference on World Wide Web Companion (WWW), pp. 759–760. ACM (2017)
Puri, M., Varde, A.S., de Melo, G.: Commonsense based text mining on urban policy. In: Language Resources and Evaluation (LREV) Journal, Springer (2022)
Davidson, T., Warmsley, D., Macy, M., Weber, I.: Automated hate speech detection and the problem of offensive language. In: 11th International AAAI Conference on Web and Social Media, pp. 512–515 (2017)
Bifet, A., Frank, E.: Sentiment knowledge discovery in Twitter streaming data. In: Discovery Science - 13th International Conference (2010)
Du, X., Kowalski, M., Varde, A.S., de Melo, G., Taylor, R.W.: Public opinion matters: mining social media text for environmental management. In: ACM SIGWEB vol. 2019, issue Autumn, pp. 5:1–5:15 (2019)
Namita, M., Basant, A., Garvit, C., Prateek, P.; Sentiment analysis of Hindi review based on negation and discourse relation. In: International Joint Conference on Natural Language Processing (2013)
Wang, L., Wang, Y., de Melo, G., Weikum, G.: Understanding archetypes of fake news via fine-grained classification. Soc. Network Anal. Mining 9(1), 37:1–37:17 (2019)
Popat, K., Mukherjee, S., Strötgen, J., Weikum, G.: CredEye: a credibility lens for analyzing and explaining misinformation. In: International Conference on World Wide Web Companion (WWW), pp. 155–158 (2016)
Torres, J., Anu, V., Varde, A.S.: Understanding the information disseminated using Twitter during the COVID-19 pandemic. In: IEEE International IOT, Electronics and Mechatronics Conference (IEMTRONICS), pp. 1–6 (2021)
Yin, Z., Rong, J., Zhi-Hua, Z.: Understanding bag-of-words model: a statistical framework. Int. J. Mach. Learn. Cybern. (2010)
Stemler, S.E., Tsai, J.: Best practices in interrater reliability three common approaches. In: Osborne, J. (ed.) Best Practices in Quantitative Methods, pp. 29–49. SAGE Publications Inc., Thousand Oaks (2011)
Mitchell, T.: Machine Learning. McGraw Hill (1997)
Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press (2016)
LeCun, Y., Bengio, Y.: Convolutional networks for images, speech, and time series. In: The Handbook of Brain Theory and Neural Networks, vol. 3361, no. 10 (1995)
Mikolov, T., Karafiat, M., Burget, K., Cernocký, J., Khudanpur, S.: Recurrent neural network based language model. Interspeech J. 2(3), 1045–1048 (2010)
Razniewski, S., Tandon, N., Varde, A.S.: Information to wisdom: commonsense knowledge extraction and compilation. In: ACM Conference on Web Search and Data Mining (WSDM), pp. 1143–1146 (2021)
Zaramba, W., Sutskever, I., Vinyals, O.: Recurrent neural network regularization. arXiv preprint arXiv:1409.2329 (2014)
Braşoveanu, A.M.P., Andonie, R.: Visualizing transformers for NLP: a brief survey. In: IEEE 24th International Conference Information Visualisation (IV) (2020)
Acknowledgments and Disclaimer
Dr. Jiayin Wang and Dr. Aparna Varde acknowledge a grant from the US National Science Foundation NSF MRI: Acquisition of a High-Performance GPU Cluster for Research and Education. Award Number 2018575. Dr. Aparna Varde is a visiting researcher at Max Planck Institute for Informatics, Saarbrucken, Germany, in the research group of Dr. Gerhard Weikum, during the academic year 2021–2022, including a sabbatical visit. The authors acknowledge the CSAM Dean’s Office Travel Grant from Montclair State University to support attending this conference. The authors would like to make the disclaimer that the opinions expressed, analyzed and presented in this work are obtained from knowledge discovery by mining the concerned data only. These do not reflect the personal or professional views of the authors.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Kommu, A., Patel, S., Derosa, S., Wang, J., Varde, A.S. (2023). HiSAT: Hierarchical Framework for Sentiment Analysis on Twitter Data. In: Arai, K. (eds) Intelligent Systems and Applications. IntelliSys 2022. Lecture Notes in Networks and Systems, vol 542. Springer, Cham. https://doi.org/10.1007/978-3-031-16072-1_28
Download citation
DOI: https://doi.org/10.1007/978-3-031-16072-1_28
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-16071-4
Online ISBN: 978-3-031-16072-1
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)