research-article

Topic Sentiment Analysis for Twitter Data in Indian Languages Using Composite Kernel SVM and Deep Learning

Authors:
Shuverthi Maity

Jadavpur University, Kolkata, West Bengal, India

Jadavpur University, Kolkata, West Bengal, India

0000-0002-2726-2647
View Profile

,
Kamal Sarkar

Jadavpur University, Kolkata, West Bengal, India

Jadavpur University, Kolkata, West Bengal, India

0000-0002-0689-3976
View Profile

ACM Transactions on Asian and Low-Resource Language Information Processing Volume 21 Issue 5Article No.: 109pp 1–35https://doi.org/10.1145/3519297

Published:25 August 2022Publication History

ACM Transactions on Asian and Low-Resource Language Information Processing

Abstract

Sentiment analysis of public opinions on social networks, such as Twitter or Facebook, can provide us with valuable information, which has a wide range of applications. But the efficiency and accuracy of the automated methods for Twitter sentiment analysis are hindered by the special characteristics of the Twitter data. The Twitter data is generally noisy, high-dimensional, and it has complex syntactic and semantic structures. Sentiment analysis of Twitter data in Indian languages is more challenging because the data is multilingual and code-mixed. In this article, we propose various composite kernel functions, each of which is used with Support Vector Machines (SVM) for developing a model for topic sentiment analysis of Twitter data in Indian languages. Each composite kernel function is constructed by taking the weighted summation of multiple single kernel functions defined by us. In addition to our proposed composite kernel SVM method, we use several state-of-the-art deep learning classifiers for topic sentiment classification. Since any suitable Twitter dataset in Indian languages is not available for conducting our experiments, we have developed our own datasets by collecting tweets related to five different Twitter trending topics in India. To prove the robustness and generalization capability of the proposed models, they are also evaluated on the US airline Twitter dataset which is a publicly available benchmark English dataset. The empirical study exhibits that the proposed composite kernel SVM method is effective for the sentiment classification task. In the case of Indian language datasets, the proposed composite kernel SVM method achieves the highest average accuracy of 74% and the highest average F-score of 0.73. On the other hand, the deep learning-based method achieves the average accuracy and the average F-score of 71.31% and 0.70, respectively. In the case of the US airline Twitter dataset, the proposed composite kernel SVM method achieves the average accuracy of 83% and the average F-score of 0.82, which are higher than that of the deep learning-based method.

REFERENCES

[1] Bataineh A. Al and Kaur D.. 2021. Immunocomputing-based approach for optimizing the topologies of LSTM networks. IEEE Acces 9 (2021), 78993–79004. DOI:Google ScholarCross Ref
[2] Alpaydin Ethem. 2014. Introduction to Machine Learning. PHI.Google ScholarDigital Library
[3] Baecchi Claudio, Uricchio Tiberio, Bertini Marco, and Bimbo Alberto Del. 2016. A multimodal feature learning approach for sentiment analysis of social network multimedia. Multimedia Tools and Applications 75, 5 (2016), 2507–2525.Google ScholarDigital Library
[4] Barman Utsab, Das Amitava, Wagner Joachim, and Foster Jennifer. 2014. Code mixing: A challenge for language identification in the language of social media. In Proceedings of the 1st Workshop on Computational Approaches to Code Switching. 13–23.Google ScholarCross Ref
[5] Bloehdorn Stephan and Moschitti Alessandro. 2007. Combined syntactic and semantic kernels for text classification. In Proceedings of the European Conference on Information Retrieval. Springer, 307–318.Google ScholarCross Ref
[6] Boecking Benedikt, Chalup Stephan K., Seese Detlef, and Wong Aaron S. W.. 2014. Support vector clustering of time series data with alignment kernels. Pattern Recognition Letters 45 (2014), 129–135.Google ScholarCross Ref
[7] Bohra Aditya, Vijay Deepanshu, Singh Vinay, Akhtar Syed Sarfaraz, and Shrivastava Manish. 2018. A dataset of hindi-english code-mixed social media text for hate speech detection. In Proceedings of the 2nd Workshop on Computational Modeling of People’s Opinions, Personality, and Emotions in Social Media. 36–41.Google ScholarCross Ref
[8] Chen Chien Chin and Tseng You-De. 2011. Quality evaluation of product reviews using an information quality framework. Decision Support Systems 50, 4 (2011), 755–768.Google ScholarDigital Library
[9] Dang C. N., Moreno-García M. N., and Prieta F. De la. 2021. Hybrid deep learning models for sentiment analysis. Complexity 2021, Article 9986920 (2021), 1–16. DOI:Google ScholarDigital Library
[10] Dang N. C., Moreno-García M. N., and Prieta F. De la. 2020. Sentiment analysis based on deep learning: A comparative study. Electronics 9, 3 (2020), 1–29. DOI: https://doi.org/doi:10.3390/electronics9030483Google ScholarCross Ref
[11] Ding Shifei, Zhang Yanan, Xu Xinzheng, and Bao Lina. 2013. A novel extreme learning machine based on hybrid kernel function. JCP 8, 8 (2013), 2110–2117.Google Scholar
[12] Dong Ruihai, O’Mahony Michael P., Schaal Markus, McCarthy Kevin, and Smyth Barry. 2016. Combining similarity and sentiment in opinion mining for product recommendation. Journal of Intelligent Information Systems 46, 2 (2016), 285–312.Google ScholarDigital Library
[13] Go Alec, Bhayani Richa, and Huang Lei. 2009. Twitter sentiment classification using distant supervision. CS224N Project Report, Stanford 1, 12 (2009), 2009. https://www-cs.stanford.edu/people/alecmgo/papers/TwitterDistantSupervision09.pdf.Google Scholar
[14] Hassan Asif, Amin Mohammad Rashedul, Mohammed N., and Azad A. K. A.. 2016. Sentiment analysis on bangla and romanized bangla text (BRBT) using deep recurrent models. In 2016 International Workshop on Computational Intelligence (IWCI). IEEE, 51–56.Google Scholar
[15] Hochreiter Sepp and Schmidhuber Jürgen. 1997. Long short-term memory. Neural Computation 9, 8 (1997), 1735–1780.Google ScholarDigital Library
[16] Jain P. K., Saravanan V., and Pamula R.. 2021. A hybrid CNN-LSTM: A deep learning approach for consumer sentiment analysis using qualitative user-generated contents. Transactions on Asian and Low-Resource Language Information Processing 20, 5 (2021), 1–15.Google ScholarDigital Library
[17] Jiachen Du., Lin Gui., Ruifeng Xu., and Yulan He.. 2017. A convolutional attention model for text classification. In Proceedings of the National CCF Conference on Natural Language Processing and Chinese Computing. 183–195.Google Scholar
[18] Jaap Kamps, Maarten Marx, Robert J. Mokken, and Maarten de Rijke. 2004. Using WordNet to measure semantic orientations of adjectives. In Proceedings of the Fourth International Conference on Language Resources and Evaluation (LREC’04). European Language Resources Association (ELRA). 1115–1118.Google Scholar
[19] Kang Hanhoon, Yoo Seong Joon, and Han Dongil. 2012. Senti-lexicon and improved Naïve Bayes algorithms for sentiment analysis of restaurant reviews. Expert Systems with Applications 39, 5 (2012), 6000–6010.Google ScholarDigital Library
[20] Konate Arouna and Du Ruiying. 2018. Sentiment analysis of code-mixed Bambara-French social media text using deep learning techniques. Wuhan University Journal of Natural Sciences 23, 3 (2018), 237–243.Google ScholarCross Ref
[21] Liao H. L., Huang Z. Y., and Liu S. H.. 2021. The effects of negative online reviews on consumer perception, attitude and purchase intention: Experimental investigation of the amount, quality, and presentation order of eWOM. Transactions on Asian and Low-Resource Language Information Processing 20, 3 (2021), 1–21.Google ScholarDigital Library
[22] Lodhi Huma, Saunders Craig, Shawe-Taylor John, Cristianini Nello, and Watkins Chris. 2002. Text classification using string kernels. Journal of Machine Learning Research 2 (2002), 419–444.Google ScholarDigital Library
[23] Luo F., Li C., and Cao Z.. 2016. Affective-feature-based sentiment analysis using SVM classifier. In Proceedings of the 2016 IEEE 20th International Conference on Computer Supported Cooperative Work in Design. 276–281.Google ScholarCross Ref
[24] Mercer James. 1909. Functions of positive and negative type and their connection with the theory of integral equations. Philosophical Transactions of the Royal Society A 209, 441–458 (1909), 415–446.Google Scholar
[25] Mihalcea Rada, Banea Carmen, and Wiebe Janyce. 2007. Learning multilingual subjective language via cross-lingual projections. In Proceedings of the 45th Annual Meeting of the Association of Computational Linguistics. 976–983.Google Scholar
[26] Monika R., Deivalakshmi S., and Janet B.. 2019. Sentiment analysis of US airlines tweets using LSTM/RNN. In Proceedings of the IEEE 9th International Conference on Advanced Computing. 92–95.Google ScholarCross Ref
[27] Mullen Tony and Collier Nigel. 2004. Sentiment analysis using support vector machines with diverse information sources. In Proceedings of the 2004 Conference on Empirical Methods in Natural Language Processing. 412–418.Google Scholar
[28] Murphy Kevin P.. 2012. Machine Learning: A Probabilistic Perspective. MIT press.Google ScholarDigital Library
[29] Narr Sascha, Hulfenhaus Michael, and Albayrak Sahin. 2012. Language-independent twitter sentiment analysis. Knowledge Discovery and Machine Learning LWA 2012 (2012), 12–14.Google Scholar
[30] Pang B., Lee L., and Vaithyanathan S.. 2002. Thumbs up?: Sentiment classification using machine learning techniques. In Proceedings of the ACL-02 Conference on Empirical Methods in Natural Language. Association for Computational Linguistics, 79–86.Google ScholarDigital Library
[31] Braja Gopal Patra, Dipankar Das, and Amitava Das. 2018. Sentiment analysis of code-mixed indian languages: An overview of SAIL_Code-Mixed Shared Task@ICON-2017. In the Shared Task Held in Conjunction with the 14th International Conference on Natural Language Processing (ICON). Retrieved in 2019 from https://arxiv.org/abs/1803.06745Google Scholar
[32] Fabian Pedregosa, Gael Varoquaux, Alexandre Gramfort, Vincent Michel, Bertrand Thirion, Olivier Grisel, Mathieu Blondel, Peter Prettenhofer, Ron Weiss, Vincent Dubourg, Jake Vanderplas, Alexandre Passos, and David Cournapeau. 2011. Scikit-learn: Machine learning in Python. The Journal of Machine Learning Research 12 (2011), 2825–2830.Google Scholar
[33] Phienthrakul T., Kijsirikul B., Takamura H., and Okumura M.. 2009. Sentiment classification with support vector machines and multiple kernel functions. In Proceedings of the International Conference on Neural Information Processing. Springer, Berlin, 583–592.Google ScholarDigital Library
[34] Aditya Joshi, Ameya Prabhu, Manish Shrivastava and Vasudeva Varma. 2016. Towards Sub-Word Level Compositions for Sentiment Analysis of Hindi-English Code Mixed Text. In Proceedings of the 26th International Conference on Computational Linguistics (COLING). 2482–2491.Google Scholar
[35] Quan Changqin and Ren Fuji. 2014. Target based review classification for fine-grained sentiment analysis. International Journal of Innovative Computing, Information and Control 10, 1 (2014), 257–268.Google Scholar
[36] Ramadhani A. M. and Goo H. S.. 2017. Twitter sentiment analysis using deep learning methods. In Proceedings of the 2017 7th International Annual Engineering Seminar. 1–4.Google ScholarCross Ref
[37] Rustam F., Ashraf I., Mehmood A., Ullah S., and Choi G. S.. 2019. Tweets classification on the base of sentiments for US airline companies. Entropy 21, 11 (2019), 1–22. DOI: https://doi.org/doi:10.3390/e21111078Google ScholarCross Ref
[38] Kamal Sarkar. 2016. A CRF based POS tagger for code-mixed indian social media text. In the NLP tool contest on “POS Tagging for Code-Mixed Indian Social Media Text”, held in conjunction with the 13th International Conference on Natural Language Processing (ICON). Indian Institute of Technology (BHU), India. Retrieved in 2016 from https://arxiv.org/abs/1612.07956.Google Scholar
[39] Sarkar Kamal. 2018. Using character N-gram features and multinomial Naïve bayes for sentiment polarity detection in bengali tweets. In Proceedings of the 2018 5th International Conference on Emerging Applications of Information Technology. IEEE, 1–4.Google ScholarCross Ref
[40] Sarkar Kamal. 2018. Using character N-gram features and multinomial naïve bayes for sentiment polarity detection in bengali tweets. In Proceedings of the 5th International Conference on Emerging Applications of Information Technology. 1–4.Google ScholarCross Ref
[41] Sarkar K.. 2019. Sentiment polarity detection in Bengali tweets using deep convolutional neural networks. Journal of Intelligent Systems 28, 3 (2019), 377–386. DOI:Google ScholarCross Ref
[42] Sarkar Kamal. 2019. Sentiment polarity detection in bengali tweets using LSTM recurrent neural networks. In Proceedings of the 2019 2nd International Conference on Advanced Computational and Communication Paradigms. IEEE, 1–6.Google ScholarCross Ref
[43] Sarkar Kamal. 2020. Heterogeneous classifier ensemble for sentiment analysis of Bengali and Hindi tweets. Sādhanā 45, 196 (2020), 1–17. DOI:Google ScholarCross Ref
[44] Sarkar Kamal and Bhowmick Mandira. 2017. Sentiment polarity detection in bengali tweets using multinomial Naïve Bayes and support vector machines. In Proceedings of the 2017 IEEE Calcutta Conference. IEEE, 31–36.Google ScholarCross Ref
[45] Severyn Aliaksei and Moschitti Alessandro. 2015. Twitter sentiment analysis with deep convolutional neural networks. In Proceedings of the 38th International ACM SIGIR Conference on Research and Development in Information Retrieval. 959–962.Google ScholarDigital Library
[46] Arnav Sharma, Sakshi Gupta, Raveesh Motlani, Piyush Bansal, Manish Srivastava, Radhika Mamidi, and Dipti M Sharma. 2016. Shallow Parsing Pipeline - Hindi-English Code-Mixed Social Media Text. In Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, San Diego, California. Association for Computational Linguistics, 1340–1345. DOI:Google ScholarCross Ref
[47] Shein K. P. P. and Nyunt T. T. S.. 2010. Sentiment classification based on ontology and SVM classifier. In Proceedings of the 2010 2nd International Conference on Communication Software and Networks. 169–172.Google ScholarDigital Library
[48] Soumya S. and Pramod K. V.. 2020. Sentiment analysis of malayalam tweets using machine learning techniques. ICT Express 6, 4 (2020), 300–305.Google ScholarCross Ref
[49] Tan Ying and Wang Jun. 2004. A support vector machine with a hybrid kernel and minimal Vapnik-Chervonenkis dimension. IEEE Transactions on Knowledge and Data Engineering 16, 4 (2004), 385–395.Google ScholarDigital Library
[50] Vapnik Vladimir. 2006. Estimation of Dependences Based on Empirical Data. Springer Science & Business Media.Google ScholarCross Ref
[51] Wan Xiaojun. 2009. Co-training for cross-lingual sentiment classification. In Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP. Association for Computational Linguistics, 235–243.Google ScholarDigital Library
[52] Wang Xin, Liu Yuanchao, Sun Cheng-Jie, Wang Baoxun, and Wang Xiaolong. 2015. Predicting polarities of tweets by composing word embeddings with long short-term memory. In Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing. 1343–1353.Google ScholarCross Ref
[53] Xiao Zheng and Liang PiJun. 2016. Chinese sentiment analysis using bidirectional lstm with word embedding. In Proceedings of the International Conference on Cloud Computing and Security. Springer, 601–610.Google ScholarCross Ref
[54] Yadav C. S. and Sharan A.. 2015. Hybrid approach for single text document summarization using statistical and sentiment features. International Journal of Information Retrieval Research 5, 4 (2015), 46–70.Google ScholarCross Ref
[55] Yanmei L. and Yuda C.. 2015. Research on chinese micro-blog sentiment analysis based on deep learning. In Proceedings of the 2015 8th International Symposium on Computational Intelligence and Design. 358–361.Google ScholarCross Ref
[56] Yanyan Zhao, Bing Qin, Ting Liu, and Duyu Tang. 2016. Social sentiment sensor: A visualization system for topic detection and topic sentiment analysis on microblog. Multimedia tools and applications. Multimedia Tools and Applications 75, 15 (2016), 8843–8860.Google ScholarDigital Library
[57] You Quanzeng, Luo Jiebo, Jin Hailin, and Yang Jianchao. 2015. Joint visual-textual sentiment analysis with deep neural networks. In Proceedings of the 23rd ACM International Conference on Multimedia. 1071–1074.Google ScholarDigital Library
[58] Zhang Changli, Zuo Wanli, Peng Tao, and He Fengling. 2008. Sentiment classification for chinese reviews using machine learning methods based on string kernel. In Proceedings of the 2008 3rd International Conference on Convergence and Hybrid Information Technology. IEEE, 909–914.Google ScholarDigital Library
[59] Lei Zhang, Riddhiman Ghosh, Mohamed Dekhil, Meichun Hsu, and Bing Liu. 2011. Combining lexicon-based and learning-based methods for Twitter sentiment analysis. Hewlett-Packard Labs Technical Report HPL-2011-89, 1–8.Google Scholar

Index Terms

Topic Sentiment Analysis for Twitter Data in Indian Languages Using Composite Kernel SVM and Deep Learning
1. Computing methodologies
  1. Artificial intelligence
    1. Natural language processing
  2. Machine learning
    1. Machine learning approaches
      1. Kernel methods
        Support vector machines
      2. Neural networks
2. Information systems
  1. World Wide Web

Recommendations

Review On Sentiment Analysis of Twitter Posts About News Headlines Using Machine Learning Approaches and Naïve Bayes Classifier
ICCAE 2020: Proceedings of the 2020 12th International Conference on Computer and Automation Engineering

In today's world there are so much micro blogging sites, among all twitter is one of the popular site. It has become an important part for all individuals, politicians, companies, celebrities, etc. Almost all the major news outlets have Twitter account ...
Read More
Topic sentiment analysis in twitter: a graph-based hashtag sentiment classification approach
CIKM '11: Proceedings of the 20th ACM international conference on Information and knowledge management

Twitter is one of the biggest platforms where massive instant messages (i.e. tweets) are published every day. Users tend to express their real feelings freely in Twitter, which makes it an ideal source for capturing the opinions towards various ...
Read More
Political Sentiment Analysis Using Twitter Data
ICC '16: Proceedings of the International Conference on Internet of things and Cloud Computing

There is a remarkable growth in the usage of social networks, such as Facebook and Twitter. Users from different cultures and backgrounds post large volumes of textual comments reflecting their opinion in different aspect of life and make them available ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Published in
ACM Transactions on Asian and Low-Resource Language Information Processing Volume 21, Issue 5
September 2022
486 pages
ISSN:2375-4699
EISSN:2375-4702
DOI:10.1145/3533669
Editor:
Imed Zitouni
Google, USA
Issue’s Table of Contents
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 25 August 2022
- Online AM: 25 May 2022
- Revised: 1 February 2022
- Accepted: 1 February 2022
- Received: 1 April 2021
Published in tallip Volume 21, Issue 5

Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
Kernel methods
composite kernel
SVM
deep learning
sentiment classification
Indian languages
code-mixed
Qualifiers
- research-article
- Refereed
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 0
  Total Citations
  View Citations
- 446
  Total Downloads
- Downloads (Last 12 months)182
- Downloads (Last 6 weeks)26
Other Metrics
View Author Metrics
Cited By
This publication has not been cited yet

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Full Text

View this article in Full Text.

View Full Text

HTML Format

View this article in HTML Format .

View HTML Format

Topic Sentiment Analysis for Twitter Data in Indian Languages Using Composite Kernel SVM and Deep Learning

ACM Transactions on Asian and Low-Resource Language Information Processing

Abstract

REFERENCES

Cited By

Index Terms

Recommendations

Review On Sentiment Analysis of Twitter Posts About News Headlines Using Machine Learning Approaches and Naïve Bayes Classifier

Topic sentiment analysis in twitter: a graph-based hashtag sentiment classification approach

Political Sentiment Analysis Using Twitter Data

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Full Text

HTML Format

Caption

Topic Sentiment Analysis for Twitter Data in Indian Languages Using Composite Kernel SVM and Deep Learning

ACM Transactions on Asian and Low-Resource Language Information Processing

Abstract

REFERENCES

Cited By

Index Terms

Recommendations

Review On Sentiment Analysis of Twitter Posts About News Headlines Using Machine Learning Approaches and Naïve Bayes Classifier

Topic sentiment analysis in twitter: a graph-based hashtag sentiment classification approach

Political Sentiment Analysis Using Twitter Data

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Full Text

HTML Format

Share this Publication link

Share on Social Media