Abstract
Sentiment analysis of public opinions on social networks, such as Twitter or Facebook, can provide us with valuable information, which has a wide range of applications. But the efficiency and accuracy of the automated methods for Twitter sentiment analysis are hindered by the special characteristics of the Twitter data. The Twitter data is generally noisy, high-dimensional, and it has complex syntactic and semantic structures. Sentiment analysis of Twitter data in Indian languages is more challenging because the data is multilingual and code-mixed. In this article, we propose various composite kernel functions, each of which is used with Support Vector Machines (SVM) for developing a model for topic sentiment analysis of Twitter data in Indian languages. Each composite kernel function is constructed by taking the weighted summation of multiple single kernel functions defined by us. In addition to our proposed composite kernel SVM method, we use several state-of-the-art deep learning classifiers for topic sentiment classification. Since any suitable Twitter dataset in Indian languages is not available for conducting our experiments, we have developed our own datasets by collecting tweets related to five different Twitter trending topics in India. To prove the robustness and generalization capability of the proposed models, they are also evaluated on the US airline Twitter dataset which is a publicly available benchmark English dataset. The empirical study exhibits that the proposed composite kernel SVM method is effective for the sentiment classification task. In the case of Indian language datasets, the proposed composite kernel SVM method achieves the highest average accuracy of 74% and the highest average F-score of 0.73. On the other hand, the deep learning-based method achieves the average accuracy and the average F-score of 71.31% and 0.70, respectively. In the case of the US airline Twitter dataset, the proposed composite kernel SVM method achieves the average accuracy of 83% and the average F-score of 0.82, which are higher than that of the deep learning-based method.
- [1] . 2021. Immunocomputing-based approach for optimizing the topologies of LSTM networks. IEEE Acces 9 (2021), 78993–79004.
DOI: Google ScholarCross Ref - [2] . 2014. Introduction to Machine Learning. PHI.Google ScholarDigital Library
- [3] . 2016. A multimodal feature learning approach for sentiment analysis of social network multimedia. Multimedia Tools and Applications 75, 5 (2016), 2507–2525.Google ScholarDigital Library
- [4] . 2014. Code mixing: A challenge for language identification in the language of social media. In Proceedings of the 1st Workshop on Computational Approaches to Code Switching. 13–23.Google ScholarCross Ref
- [5] . 2007. Combined syntactic and semantic kernels for text classification. In Proceedings of the European Conference on Information Retrieval. Springer, 307–318.Google ScholarCross Ref
- [6] . 2014. Support vector clustering of time series data with alignment kernels. Pattern Recognition Letters 45 (2014), 129–135.Google ScholarCross Ref
- [7] . 2018. A dataset of hindi-english code-mixed social media text for hate speech detection. In Proceedings of the 2nd Workshop on Computational Modeling of People’s Opinions, Personality, and Emotions in Social Media. 36–41.Google ScholarCross Ref
- [8] . 2011. Quality evaluation of product reviews using an information quality framework. Decision Support Systems 50, 4 (2011), 755–768.Google ScholarDigital Library
- [9] . 2021. Hybrid deep learning models for sentiment analysis. Complexity 2021, Article 9986920 (2021), 1–16.
DOI: Google ScholarDigital Library - [10] . 2020. Sentiment analysis based on deep learning: A comparative study. Electronics 9, 3 (2020), 1–29.
DOI: https://doi.org/doi:10.3390/electronics9030483Google ScholarCross Ref - [11] . 2013. A novel extreme learning machine based on hybrid kernel function. JCP 8, 8 (2013), 2110–2117.Google Scholar
- [12] . 2016. Combining similarity and sentiment in opinion mining for product recommendation. Journal of Intelligent Information Systems 46, 2 (2016), 285–312.Google ScholarDigital Library
- [13] . 2009. Twitter sentiment classification using distant supervision. CS224N Project Report, Stanford 1, 12 (2009), 2009. https://www-cs.stanford.edu/people/alecmgo/papers/TwitterDistantSupervision09.pdf.Google Scholar
- [14] . 2016. Sentiment analysis on bangla and romanized bangla text (BRBT) using deep recurrent models. In 2016 International Workshop on Computational Intelligence (IWCI). IEEE, 51–56.Google Scholar
- [15] . 1997. Long short-term memory. Neural Computation 9, 8 (1997), 1735–1780.Google ScholarDigital Library
- [16] . 2021. A hybrid CNN-LSTM: A deep learning approach for consumer sentiment analysis using qualitative user-generated contents. Transactions on Asian and Low-Resource Language Information Processing 20, 5 (2021), 1–15.Google ScholarDigital Library
- [17] . 2017. A convolutional attention model for text classification. In Proceedings of the National CCF Conference on Natural Language Processing and Chinese Computing. 183–195.Google Scholar
- [18] Jaap Kamps, Maarten Marx, Robert J. Mokken, and Maarten de Rijke. 2004. Using WordNet to measure semantic orientations of adjectives. In Proceedings of the Fourth International Conference on Language Resources and Evaluation (LREC’04). European Language Resources Association (ELRA). 1115–1118.Google Scholar
- [19] . 2012. Senti-lexicon and improved Naïve Bayes algorithms for sentiment analysis of restaurant reviews. Expert Systems with Applications 39, 5 (2012), 6000–6010.Google ScholarDigital Library
- [20] . 2018. Sentiment analysis of code-mixed Bambara-French social media text using deep learning techniques. Wuhan University Journal of Natural Sciences 23, 3 (2018), 237–243.Google ScholarCross Ref
- [21] . 2021. The effects of negative online reviews on consumer perception, attitude and purchase intention: Experimental investigation of the amount, quality, and presentation order of eWOM. Transactions on Asian and Low-Resource Language Information Processing 20, 3 (2021), 1–21.Google ScholarDigital Library
- [22] . 2002. Text classification using string kernels. Journal of Machine Learning Research 2 (2002), 419–444.Google ScholarDigital Library
- [23] . 2016. Affective-feature-based sentiment analysis using SVM classifier. In Proceedings of the 2016 IEEE 20th International Conference on Computer Supported Cooperative Work in Design. 276–281.Google ScholarCross Ref
- [24] . 1909. Functions of positive and negative type and their connection with the theory of integral equations. Philosophical Transactions of the Royal Society A 209, 441–458 (1909), 415–446.Google Scholar
- [25] . 2007. Learning multilingual subjective language via cross-lingual projections. In Proceedings of the 45th Annual Meeting of the Association of Computational Linguistics. 976–983.Google Scholar
- [26] . 2019. Sentiment analysis of US airlines tweets using LSTM/RNN. In Proceedings of the IEEE 9th International Conference on Advanced Computing. 92–95.Google ScholarCross Ref
- [27] . 2004. Sentiment analysis using support vector machines with diverse information sources. In Proceedings of the 2004 Conference on Empirical Methods in Natural Language Processing. 412–418.Google Scholar
- [28] . 2012. Machine Learning: A Probabilistic Perspective. MIT press.Google ScholarDigital Library
- [29] . 2012. Language-independent twitter sentiment analysis. Knowledge Discovery and Machine Learning LWA 2012 (2012), 12–14.Google Scholar
- [30] . 2002. Thumbs up?: Sentiment classification using machine learning techniques. In Proceedings of the ACL-02 Conference on Empirical Methods in Natural Language. Association for Computational Linguistics, 79–86.Google ScholarDigital Library
- [31] Braja Gopal Patra, Dipankar Das, and Amitava Das. 2018. Sentiment analysis of code-mixed indian languages: An overview of SAIL_Code-Mixed Shared Task@ICON-2017. In the Shared Task Held in Conjunction with the 14th International Conference on Natural Language Processing (ICON). Retrieved in 2019 from https://arxiv.org/abs/1803.06745Google Scholar
- [32] Fabian Pedregosa, Gael Varoquaux, Alexandre Gramfort, Vincent Michel, Bertrand Thirion, Olivier Grisel, Mathieu Blondel, Peter Prettenhofer, Ron Weiss, Vincent Dubourg, Jake Vanderplas, Alexandre Passos, and David Cournapeau. 2011. Scikit-learn: Machine learning in Python. The Journal of Machine Learning Research 12 (2011), 2825–2830.Google Scholar
- [33] . 2009. Sentiment classification with support vector machines and multiple kernel functions. In Proceedings of the International Conference on Neural Information Processing. Springer, Berlin, 583–592.Google ScholarDigital Library
- [34] Aditya Joshi, Ameya Prabhu, Manish Shrivastava and Vasudeva Varma. 2016. Towards Sub-Word Level Compositions for Sentiment Analysis of Hindi-English Code Mixed Text. In Proceedings of the 26th International Conference on Computational Linguistics (COLING). 2482–2491.Google Scholar
- [35] . 2014. Target based review classification for fine-grained sentiment analysis. International Journal of Innovative Computing, Information and Control 10, 1 (2014), 257–268.Google Scholar
- [36] . 2017. Twitter sentiment analysis using deep learning methods. In Proceedings of the 2017 7th International Annual Engineering Seminar. 1–4.Google ScholarCross Ref
- [37] . 2019. Tweets classification on the base of sentiments for US airline companies. Entropy 21, 11 (2019), 1–22.
DOI: https://doi.org/doi:10.3390/e21111078Google ScholarCross Ref - [38] Kamal Sarkar. 2016. A CRF based POS tagger for code-mixed indian social media text. In the NLP tool contest on “POS Tagging for Code-Mixed Indian Social Media Text”, held in conjunction with the 13th International Conference on Natural Language Processing (ICON). Indian Institute of Technology (BHU), India. Retrieved in 2016 from https://arxiv.org/abs/1612.07956.Google Scholar
- [39] . 2018. Using character N-gram features and multinomial Naïve bayes for sentiment polarity detection in bengali tweets. In Proceedings of the 2018 5th International Conference on Emerging Applications of Information Technology. IEEE, 1–4.Google ScholarCross Ref
- [40] . 2018. Using character N-gram features and multinomial naïve bayes for sentiment polarity detection in bengali tweets. In Proceedings of the 5th International Conference on Emerging Applications of Information Technology. 1–4.Google ScholarCross Ref
- [41] . 2019. Sentiment polarity detection in Bengali tweets using deep convolutional neural networks. Journal of Intelligent Systems 28, 3 (2019), 377–386.
DOI: Google ScholarCross Ref - [42] . 2019. Sentiment polarity detection in bengali tweets using LSTM recurrent neural networks. In Proceedings of the 2019 2nd International Conference on Advanced Computational and Communication Paradigms. IEEE, 1–6.Google ScholarCross Ref
- [43] . 2020. Heterogeneous classifier ensemble for sentiment analysis of Bengali and Hindi tweets. Sādhanā 45, 196 (2020), 1–17.
DOI: Google ScholarCross Ref - [44] . 2017. Sentiment polarity detection in bengali tweets using multinomial Naïve Bayes and support vector machines. In Proceedings of the 2017 IEEE Calcutta Conference. IEEE, 31–36.Google ScholarCross Ref
- [45] . 2015. Twitter sentiment analysis with deep convolutional neural networks. In Proceedings of the 38th International ACM SIGIR Conference on Research and Development in Information Retrieval. 959–962.Google ScholarDigital Library
- [46] Arnav Sharma, Sakshi Gupta, Raveesh Motlani, Piyush Bansal, Manish Srivastava, Radhika Mamidi, and Dipti M Sharma. 2016. Shallow Parsing Pipeline - Hindi-English Code-Mixed Social Media Text. In Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, San Diego, California. Association for Computational Linguistics, 1340–1345.
DOI: Google ScholarCross Ref - [47] . 2010. Sentiment classification based on ontology and SVM classifier. In Proceedings of the 2010 2nd International Conference on Communication Software and Networks. 169–172.Google ScholarDigital Library
- [48] . 2020. Sentiment analysis of malayalam tweets using machine learning techniques. ICT Express 6, 4 (2020), 300–305.Google ScholarCross Ref
- [49] . 2004. A support vector machine with a hybrid kernel and minimal Vapnik-Chervonenkis dimension. IEEE Transactions on Knowledge and Data Engineering 16, 4 (2004), 385–395.Google ScholarDigital Library
- [50] . 2006. Estimation of Dependences Based on Empirical Data. Springer Science & Business Media.Google ScholarCross Ref
- [51] . 2009. Co-training for cross-lingual sentiment classification. In Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP. Association for Computational Linguistics, 235–243.Google ScholarDigital Library
- [52] . 2015. Predicting polarities of tweets by composing word embeddings with long short-term memory. In Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing. 1343–1353.Google ScholarCross Ref
- [53] . 2016. Chinese sentiment analysis using bidirectional lstm with word embedding. In Proceedings of the International Conference on Cloud Computing and Security. Springer, 601–610.Google ScholarCross Ref
- [54] . 2015. Hybrid approach for single text document summarization using statistical and sentiment features. International Journal of Information Retrieval Research 5, 4 (2015), 46–70.Google ScholarCross Ref
- [55] . 2015. Research on chinese micro-blog sentiment analysis based on deep learning. In Proceedings of the 2015 8th International Symposium on Computational Intelligence and Design. 358–361.Google ScholarCross Ref
- [56] . 2016. Social sentiment sensor: A visualization system for topic detection and topic sentiment analysis on microblog. Multimedia tools and applications. Multimedia Tools and Applications 75, 15 (2016), 8843–8860.Google ScholarDigital Library
- [57] . 2015. Joint visual-textual sentiment analysis with deep neural networks. In Proceedings of the 23rd ACM International Conference on Multimedia. 1071–1074.Google ScholarDigital Library
- [58] . 2008. Sentiment classification for chinese reviews using machine learning methods based on string kernel. In Proceedings of the 2008 3rd International Conference on Convergence and Hybrid Information Technology. IEEE, 909–914.Google ScholarDigital Library
- [59] Lei Zhang, Riddhiman Ghosh, Mohamed Dekhil, Meichun Hsu, and Bing Liu. 2011. Combining lexicon-based and learning-based methods for Twitter sentiment analysis. Hewlett-Packard Labs Technical Report HPL-2011-89, 1–8.Google Scholar
Index Terms
- Topic Sentiment Analysis for Twitter Data in Indian Languages Using Composite Kernel SVM and Deep Learning
Recommendations
Review On Sentiment Analysis of Twitter Posts About News Headlines Using Machine Learning Approaches and Naïve Bayes Classifier
ICCAE 2020: Proceedings of the 2020 12th International Conference on Computer and Automation EngineeringIn today's world there are so much micro blogging sites, among all twitter is one of the popular site. It has become an important part for all individuals, politicians, companies, celebrities, etc. Almost all the major news outlets have Twitter account ...
Topic sentiment analysis in twitter: a graph-based hashtag sentiment classification approach
CIKM '11: Proceedings of the 20th ACM international conference on Information and knowledge managementTwitter is one of the biggest platforms where massive instant messages (i.e. tweets) are published every day. Users tend to express their real feelings freely in Twitter, which makes it an ideal source for capturing the opinions towards various ...
Political Sentiment Analysis Using Twitter Data
ICC '16: Proceedings of the International Conference on Internet of things and Cloud ComputingThere is a remarkable growth in the usage of social networks, such as Facebook and Twitter. Users from different cultures and backgrounds post large volumes of textual comments reflecting their opinion in different aspect of life and make them available ...
Comments