skip to main content
research-article

Topic Sentiment Analysis for Twitter Data in Indian Languages Using Composite Kernel SVM and Deep Learning

Published:25 August 2022Publication History
Skip Abstract Section

Abstract

Sentiment analysis of public opinions on social networks, such as Twitter or Facebook, can provide us with valuable information, which has a wide range of applications. But the efficiency and accuracy of the automated methods for Twitter sentiment analysis are hindered by the special characteristics of the Twitter data. The Twitter data is generally noisy, high-dimensional, and it has complex syntactic and semantic structures. Sentiment analysis of Twitter data in Indian languages is more challenging because the data is multilingual and code-mixed. In this article, we propose various composite kernel functions, each of which is used with Support Vector Machines (SVM) for developing a model for topic sentiment analysis of Twitter data in Indian languages. Each composite kernel function is constructed by taking the weighted summation of multiple single kernel functions defined by us. In addition to our proposed composite kernel SVM method, we use several state-of-the-art deep learning classifiers for topic sentiment classification. Since any suitable Twitter dataset in Indian languages is not available for conducting our experiments, we have developed our own datasets by collecting tweets related to five different Twitter trending topics in India. To prove the robustness and generalization capability of the proposed models, they are also evaluated on the US airline Twitter dataset which is a publicly available benchmark English dataset. The empirical study exhibits that the proposed composite kernel SVM method is effective for the sentiment classification task. In the case of Indian language datasets, the proposed composite kernel SVM method achieves the highest average accuracy of 74% and the highest average F-score of 0.73. On the other hand, the deep learning-based method achieves the average accuracy and the average F-score of 71.31% and 0.70, respectively. In the case of the US airline Twitter dataset, the proposed composite kernel SVM method achieves the average accuracy of 83% and the average F-score of 0.82, which are higher than that of the deep learning-based method.

REFERENCES

  1. [1] Bataineh A. Al and Kaur D.. 2021. Immunocomputing-based approach for optimizing the topologies of LSTM networks. IEEE Acces 9 (2021), 7899379004. DOI:Google ScholarGoogle ScholarCross RefCross Ref
  2. [2] Alpaydin Ethem. 2014. Introduction to Machine Learning. PHI.Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. [3] Baecchi Claudio, Uricchio Tiberio, Bertini Marco, and Bimbo Alberto Del. 2016. A multimodal feature learning approach for sentiment analysis of social network multimedia. Multimedia Tools and Applications 75, 5 (2016), 25072525.Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. [4] Barman Utsab, Das Amitava, Wagner Joachim, and Foster Jennifer. 2014. Code mixing: A challenge for language identification in the language of social media. In Proceedings of the 1st Workshop on Computational Approaches to Code Switching. 1323.Google ScholarGoogle ScholarCross RefCross Ref
  5. [5] Bloehdorn Stephan and Moschitti Alessandro. 2007. Combined syntactic and semantic kernels for text classification. In Proceedings of the European Conference on Information Retrieval. Springer, 307318.Google ScholarGoogle ScholarCross RefCross Ref
  6. [6] Boecking Benedikt, Chalup Stephan K., Seese Detlef, and Wong Aaron S. W.. 2014. Support vector clustering of time series data with alignment kernels. Pattern Recognition Letters 45 (2014), 129135.Google ScholarGoogle ScholarCross RefCross Ref
  7. [7] Bohra Aditya, Vijay Deepanshu, Singh Vinay, Akhtar Syed Sarfaraz, and Shrivastava Manish. 2018. A dataset of hindi-english code-mixed social media text for hate speech detection. In Proceedings of the 2nd Workshop on Computational Modeling of People’s Opinions, Personality, and Emotions in Social Media. 3641.Google ScholarGoogle ScholarCross RefCross Ref
  8. [8] Chen Chien Chin and Tseng You-De. 2011. Quality evaluation of product reviews using an information quality framework. Decision Support Systems 50, 4 (2011), 755768.Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. [9] Dang C. N., Moreno-García M. N., and Prieta F. De la. 2021. Hybrid deep learning models for sentiment analysis. Complexity 2021, Article 9986920 (2021), 116. DOI:Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. [10] Dang N. C., Moreno-García M. N., and Prieta F. De la. 2020. Sentiment analysis based on deep learning: A comparative study. Electronics 9, 3 (2020), 129. DOI: https://doi.org/doi:10.3390/electronics9030483Google ScholarGoogle ScholarCross RefCross Ref
  11. [11] Ding Shifei, Zhang Yanan, Xu Xinzheng, and Bao Lina. 2013. A novel extreme learning machine based on hybrid kernel function. JCP 8, 8 (2013), 21102117.Google ScholarGoogle Scholar
  12. [12] Dong Ruihai, O’Mahony Michael P., Schaal Markus, McCarthy Kevin, and Smyth Barry. 2016. Combining similarity and sentiment in opinion mining for product recommendation. Journal of Intelligent Information Systems 46, 2 (2016), 285312.Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. [13] Go Alec, Bhayani Richa, and Huang Lei. 2009. Twitter sentiment classification using distant supervision. CS224N Project Report, Stanford 1, 12 (2009), 2009. https://www-cs.stanford.edu/people/alecmgo/papers/TwitterDistantSupervision09.pdf.Google ScholarGoogle Scholar
  14. [14] Hassan Asif, Amin Mohammad Rashedul, Mohammed N., and Azad A. K. A.. 2016. Sentiment analysis on bangla and romanized bangla text (BRBT) using deep recurrent models. In 2016 International Workshop on Computational Intelligence (IWCI). IEEE, 51–56.Google ScholarGoogle Scholar
  15. [15] Hochreiter Sepp and Schmidhuber Jürgen. 1997. Long short-term memory. Neural Computation 9, 8 (1997), 17351780.Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. [16] Jain P. K., Saravanan V., and Pamula R.. 2021. A hybrid CNN-LSTM: A deep learning approach for consumer sentiment analysis using qualitative user-generated contents. Transactions on Asian and Low-Resource Language Information Processing 20, 5 (2021), 115.Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. [17] Jiachen Du., Lin Gui., Ruifeng Xu., and Yulan He.. 2017. A convolutional attention model for text classification. In Proceedings of the National CCF Conference on Natural Language Processing and Chinese Computing. 183195.Google ScholarGoogle Scholar
  18. [18] Jaap Kamps, Maarten Marx, Robert J. Mokken, and Maarten de Rijke. 2004. Using WordNet to measure semantic orientations of adjectives. In Proceedings of the Fourth International Conference on Language Resources and Evaluation (LREC’04). European Language Resources Association (ELRA). 1115–1118.Google ScholarGoogle Scholar
  19. [19] Kang Hanhoon, Yoo Seong Joon, and Han Dongil. 2012. Senti-lexicon and improved Naïve Bayes algorithms for sentiment analysis of restaurant reviews. Expert Systems with Applications 39, 5 (2012), 60006010.Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. [20] Konate Arouna and Du Ruiying. 2018. Sentiment analysis of code-mixed Bambara-French social media text using deep learning techniques. Wuhan University Journal of Natural Sciences 23, 3 (2018), 237243.Google ScholarGoogle ScholarCross RefCross Ref
  21. [21] Liao H. L., Huang Z. Y., and Liu S. H.. 2021. The effects of negative online reviews on consumer perception, attitude and purchase intention: Experimental investigation of the amount, quality, and presentation order of eWOM. Transactions on Asian and Low-Resource Language Information Processing 20, 3 (2021), 121.Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. [22] Lodhi Huma, Saunders Craig, Shawe-Taylor John, Cristianini Nello, and Watkins Chris. 2002. Text classification using string kernels. Journal of Machine Learning Research 2 (2002), 419444.Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. [23] Luo F., Li C., and Cao Z.. 2016. Affective-feature-based sentiment analysis using SVM classifier. In Proceedings of the 2016 IEEE 20th International Conference on Computer Supported Cooperative Work in Design. 276281.Google ScholarGoogle ScholarCross RefCross Ref
  24. [24] Mercer James. 1909. Functions of positive and negative type and their connection with the theory of integral equations. Philosophical Transactions of the Royal Society A 209, 441–458 (1909), 415–446.Google ScholarGoogle Scholar
  25. [25] Mihalcea Rada, Banea Carmen, and Wiebe Janyce. 2007. Learning multilingual subjective language via cross-lingual projections. In Proceedings of the 45th Annual Meeting of the Association of Computational Linguistics. 976983.Google ScholarGoogle Scholar
  26. [26] Monika R., Deivalakshmi S., and Janet B.. 2019. Sentiment analysis of US airlines tweets using LSTM/RNN. In Proceedings of the IEEE 9th International Conference on Advanced Computing. 9295.Google ScholarGoogle ScholarCross RefCross Ref
  27. [27] Mullen Tony and Collier Nigel. 2004. Sentiment analysis using support vector machines with diverse information sources. In Proceedings of the 2004 Conference on Empirical Methods in Natural Language Processing. 412418.Google ScholarGoogle Scholar
  28. [28] Murphy Kevin P.. 2012. Machine Learning: A Probabilistic Perspective. MIT press.Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. [29] Narr Sascha, Hulfenhaus Michael, and Albayrak Sahin. 2012. Language-independent twitter sentiment analysis. Knowledge Discovery and Machine Learning LWA 2012 (2012), 1214.Google ScholarGoogle Scholar
  30. [30] Pang B., Lee L., and Vaithyanathan S.. 2002. Thumbs up?: Sentiment classification using machine learning techniques. In Proceedings of the ACL-02 Conference on Empirical Methods in Natural Language. Association for Computational Linguistics, 7986.Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. [31] Braja Gopal Patra, Dipankar Das, and Amitava Das. 2018. Sentiment analysis of code-mixed indian languages: An overview of SAIL_Code-Mixed Shared Task@ICON-2017. In the Shared Task Held in Conjunction with the 14th International Conference on Natural Language Processing (ICON). Retrieved in 2019 from https://arxiv.org/abs/1803.06745Google ScholarGoogle Scholar
  32. [32] Fabian Pedregosa, Gael Varoquaux, Alexandre Gramfort, Vincent Michel, Bertrand Thirion, Olivier Grisel, Mathieu Blondel, Peter Prettenhofer, Ron Weiss, Vincent Dubourg, Jake Vanderplas, Alexandre Passos, and David Cournapeau. 2011. Scikit-learn: Machine learning in Python. The Journal of Machine Learning Research 12 (2011), 2825–2830.Google ScholarGoogle Scholar
  33. [33] Phienthrakul T., Kijsirikul B., Takamura H., and Okumura M.. 2009. Sentiment classification with support vector machines and multiple kernel functions. In Proceedings of the International Conference on Neural Information Processing. Springer, Berlin, 583592.Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. [34] Aditya Joshi, Ameya Prabhu, Manish Shrivastava and Vasudeva Varma. 2016. Towards Sub-Word Level Compositions for Sentiment Analysis of Hindi-English Code Mixed Text. In Proceedings of the 26th International Conference on Computational Linguistics (COLING). 2482–2491.Google ScholarGoogle Scholar
  35. [35] Quan Changqin and Ren Fuji. 2014. Target based review classification for fine-grained sentiment analysis. International Journal of Innovative Computing, Information and Control 10, 1 (2014), 257268.Google ScholarGoogle Scholar
  36. [36] Ramadhani A. M. and Goo H. S.. 2017. Twitter sentiment analysis using deep learning methods. In Proceedings of the 2017 7th International Annual Engineering Seminar. 14.Google ScholarGoogle ScholarCross RefCross Ref
  37. [37] Rustam F., Ashraf I., Mehmood A., Ullah S., and Choi G. S.. 2019. Tweets classification on the base of sentiments for US airline companies. Entropy 21, 11 (2019), 122. DOI: https://doi.org/doi:10.3390/e21111078Google ScholarGoogle ScholarCross RefCross Ref
  38. [38] Kamal Sarkar. 2016. A CRF based POS tagger for code-mixed indian social media text. In the NLP tool contest on “POS Tagging for Code-Mixed Indian Social Media Text”, held in conjunction with the 13th International Conference on Natural Language Processing (ICON). Indian Institute of Technology (BHU), India. Retrieved in 2016 from https://arxiv.org/abs/1612.07956.Google ScholarGoogle Scholar
  39. [39] Sarkar Kamal. 2018. Using character N-gram features and multinomial Naïve bayes for sentiment polarity detection in bengali tweets. In Proceedings of the 2018 5th International Conference on Emerging Applications of Information Technology. IEEE, 14.Google ScholarGoogle ScholarCross RefCross Ref
  40. [40] Sarkar Kamal. 2018. Using character N-gram features and multinomial naïve bayes for sentiment polarity detection in bengali tweets. In Proceedings of the 5th International Conference on Emerging Applications of Information Technology. 14.Google ScholarGoogle ScholarCross RefCross Ref
  41. [41] Sarkar K.. 2019. Sentiment polarity detection in Bengali tweets using deep convolutional neural networks. Journal of Intelligent Systems 28, 3 (2019), 377386. DOI:Google ScholarGoogle ScholarCross RefCross Ref
  42. [42] Sarkar Kamal. 2019. Sentiment polarity detection in bengali tweets using LSTM recurrent neural networks. In Proceedings of the 2019 2nd International Conference on Advanced Computational and Communication Paradigms. IEEE, 16.Google ScholarGoogle ScholarCross RefCross Ref
  43. [43] Sarkar Kamal. 2020. Heterogeneous classifier ensemble for sentiment analysis of Bengali and Hindi tweets. Sādhanā 45, 196 (2020), 1–17. DOI:Google ScholarGoogle ScholarCross RefCross Ref
  44. [44] Sarkar Kamal and Bhowmick Mandira. 2017. Sentiment polarity detection in bengali tweets using multinomial Naïve Bayes and support vector machines. In Proceedings of the 2017 IEEE Calcutta Conference. IEEE, 3136.Google ScholarGoogle ScholarCross RefCross Ref
  45. [45] Severyn Aliaksei and Moschitti Alessandro. 2015. Twitter sentiment analysis with deep convolutional neural networks. In Proceedings of the 38th International ACM SIGIR Conference on Research and Development in Information Retrieval. 959962.Google ScholarGoogle ScholarDigital LibraryDigital Library
  46. [46] Arnav Sharma, Sakshi Gupta, Raveesh Motlani, Piyush Bansal, Manish Srivastava, Radhika Mamidi, and Dipti M Sharma. 2016. Shallow Parsing Pipeline - Hindi-English Code-Mixed Social Media Text. In Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, San Diego, California. Association for Computational Linguistics, 1340–1345. DOI:Google ScholarGoogle ScholarCross RefCross Ref
  47. [47] Shein K. P. P. and Nyunt T. T. S.. 2010. Sentiment classification based on ontology and SVM classifier. In Proceedings of the 2010 2nd International Conference on Communication Software and Networks. 169172.Google ScholarGoogle ScholarDigital LibraryDigital Library
  48. [48] Soumya S. and Pramod K. V.. 2020. Sentiment analysis of malayalam tweets using machine learning techniques. ICT Express 6, 4 (2020), 300305.Google ScholarGoogle ScholarCross RefCross Ref
  49. [49] Tan Ying and Wang Jun. 2004. A support vector machine with a hybrid kernel and minimal Vapnik-Chervonenkis dimension. IEEE Transactions on Knowledge and Data Engineering 16, 4 (2004), 385395.Google ScholarGoogle ScholarDigital LibraryDigital Library
  50. [50] Vapnik Vladimir. 2006. Estimation of Dependences Based on Empirical Data. Springer Science & Business Media.Google ScholarGoogle ScholarCross RefCross Ref
  51. [51] Wan Xiaojun. 2009. Co-training for cross-lingual sentiment classification. In Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP. Association for Computational Linguistics, 235243.Google ScholarGoogle ScholarDigital LibraryDigital Library
  52. [52] Wang Xin, Liu Yuanchao, Sun Cheng-Jie, Wang Baoxun, and Wang Xiaolong. 2015. Predicting polarities of tweets by composing word embeddings with long short-term memory. In Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing. 13431353.Google ScholarGoogle ScholarCross RefCross Ref
  53. [53] Xiao Zheng and Liang PiJun. 2016. Chinese sentiment analysis using bidirectional lstm with word embedding. In Proceedings of the International Conference on Cloud Computing and Security. Springer, 601610.Google ScholarGoogle ScholarCross RefCross Ref
  54. [54] Yadav C. S. and Sharan A.. 2015. Hybrid approach for single text document summarization using statistical and sentiment features. International Journal of Information Retrieval Research 5, 4 (2015), 4670.Google ScholarGoogle ScholarCross RefCross Ref
  55. [55] Yanmei L. and Yuda C.. 2015. Research on chinese micro-blog sentiment analysis based on deep learning. In Proceedings of the 2015 8th International Symposium on Computational Intelligence and Design. 358361.Google ScholarGoogle ScholarCross RefCross Ref
  56. [56] Yanyan Zhao, Bing Qin, Ting Liu, and Duyu Tang. 2016. Social sentiment sensor: A visualization system for topic detection and topic sentiment analysis on microblog. Multimedia tools and applications. Multimedia Tools and Applications 75, 15 (2016), 88438860.Google ScholarGoogle ScholarDigital LibraryDigital Library
  57. [57] You Quanzeng, Luo Jiebo, Jin Hailin, and Yang Jianchao. 2015. Joint visual-textual sentiment analysis with deep neural networks. In Proceedings of the 23rd ACM International Conference on Multimedia. 10711074.Google ScholarGoogle ScholarDigital LibraryDigital Library
  58. [58] Zhang Changli, Zuo Wanli, Peng Tao, and He Fengling. 2008. Sentiment classification for chinese reviews using machine learning methods based on string kernel. In Proceedings of the 2008 3rd International Conference on Convergence and Hybrid Information Technology. IEEE, 909914.Google ScholarGoogle ScholarDigital LibraryDigital Library
  59. [59] Lei Zhang, Riddhiman Ghosh, Mohamed Dekhil, Meichun Hsu, and Bing Liu. 2011. Combining lexicon-based and learning-based methods for Twitter sentiment analysis. Hewlett-Packard Labs Technical Report HPL-2011-89, 1–8.Google ScholarGoogle Scholar

Index Terms

  1. Topic Sentiment Analysis for Twitter Data in Indian Languages Using Composite Kernel SVM and Deep Learning

          Recommendations

          Comments

          Login options

          Check if you have access through your login credentials or your institution to get full access on this article.

          Sign in

          Full Access

          • Published in

            cover image ACM Transactions on Asian and Low-Resource Language Information Processing
            ACM Transactions on Asian and Low-Resource Language Information Processing  Volume 21, Issue 5
            September 2022
            486 pages
            ISSN:2375-4699
            EISSN:2375-4702
            DOI:10.1145/3533669
            Issue’s Table of Contents

            Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

            Publisher

            Association for Computing Machinery

            New York, NY, United States

            Publication History

            • Published: 25 August 2022
            • Online AM: 25 May 2022
            • Revised: 1 February 2022
            • Accepted: 1 February 2022
            • Received: 1 April 2021
            Published in tallip Volume 21, Issue 5

            Permissions

            Request permissions about this article.

            Request Permissions

            Check for updates

            Qualifiers

            • research-article
            • Refereed
          • Article Metrics

            • Downloads (Last 12 months)182
            • Downloads (Last 6 weeks)26

            Other Metrics

          PDF Format

          View or Download as a PDF file.

          PDF

          eReader

          View online with eReader.

          eReader

          Full Text

          View this article in Full Text.

          View Full Text

          HTML Format

          View this article in HTML Format .

          View HTML Format