Abstract
Sentiment analysis is crucial in various systems such as opinion mining and predicting. Considerable research has been done to analyze sentiment using various machine learning techniques. However, the high error rates in these studies can reduce the entire system’s efficiency. We introduce a novel big data and machine learning technique for evaluating sentiment analysis processes to overcome this problem. The data are collected from a huge volume of datasets, helpful in the effective analysis of systems. The noise in the data is eliminated using a preprocessing data mining concept. From the cleaned sentiment data, effective features are selected using a greedy approach that selects optimal features processed by an optimal classifier called cat swarm optimization-based long short-term memory neural network (CSO-LSTMNN). The classifiers analyze sentiment-related features according to cat behavior, minimizing error rate while examining features. This technique helps improve system efficiency, analyzed using experimental results of error rate, precision, recall, and accuracy. The results obtained by implementing the greedy feature and CSO-LSTMNN algorithm and the particle swarm optimization (PSO) algorithm are compared; CSO-LSTMNN outperforms PSO in terms of increasing accuracy and decreasing error rate.
Similar content being viewed by others
References
Zhang L, Liu B (2017) Sentiment analysis and opinion mining. In: Sammut C, Webb GI (eds) Encyclopedia of machine learning and data mining. Springer, Boston, MA. https://doi.org/10.1007/978-1-4899-7687-1_907
Lee G, un Jeong J, Seo S, Kim C (2018) Sentiment classification with word localization based on weakly supervised learning with a convolutional neural network. Knowl Based Syst. https://doi.org/10.1016/j.knosys.2018.04.006
Pang B, Lee L (2008) Opinion mining and sentiment analysis. Found Trends Inf Retr 2:1–135
Bhatia S, Sharma M, Bhatia KK (2018) Sentiment analysis and mining of opinions. Internet of things and big data analytics toward next-generation intelligence. Springer, Cham, pp 503–523
Tolba A, Elashkar E (2018) Soft computing approaches based bookmark selection and clustering techniques for social tagging systems. Cluster Comput 1–7. https://doi.org/10.1007/s10586-018-2014-5
Liu Y, Gao C, Zhang Z, Lu Y, Chen S, Liang M, Tao L (2017) Solving NP-hard problems with Physarum-based ant colony system. IEEE/ACM Trans Comput Biol Bioinf 14:108–120
Nabaei A, Hamian M, Parsaei MR, Safdari R, Samad-Soltani T, Zarrabi H, Ghassemi A (2018) Topologies and performance of intelligent algorithms: a comprehensive review. Artif Intell Rev 49:79–103
Roy S, Biswas S, Chaudhuri SS (2014) Nature-inspired swarm intelligence and its applications. Int J Mod Educ Comp Sci 12:55–65
Mahi M, Baykan OK, Kodaz H (2018) A new approach based on particle swarm optimization algorithm for solving data allocation problem. Appl Soft Comput 62:571–578
Pandey HM, Rajput M, Mishra V (2018) Performance comparison of pattern search, simulated annealing, genetic algorithm and jaya algorithm. Data engineering and intelligent computing. Springer, Singapore, pp 377–384
Gill SS, Buyya R, Chana I, Singh M, Abraham A (2018) BULLET: particle swarm optimization based scheduling technique for provisioned cloud resources. J Netw Sys Manag 26:361–400
Bhalla R, Jain P (2016) A model based on effective and intelligent sentiment mining: a review. Indian J Sci Technol 9:32
Nikitidis S, Nikolaidis N, Pitas I (2012) Multiplicative update rules for incremental training of multiclass support vector machines. Pattern Recognit 45:1838–1852
Kingma DP, Mohamed S, Rezende DJ, Welling M (2014) Semi-supervised learning with deep generative models. Adv Neural Inf Proc Sys 2:3581–3589
Isaac T, García S, Herrera F (2015) Self-labeled techniques for semi-supervised learning: taxonomy, software and empirical study. Knowl Inf Sys 42:245–284
Lafferty J, McCallum A, Pereira F (2001) Conditional random fields: Probabilistic models for segmenting and labeling sequence data. In: Proceedings of the International Conference on Machine Learning, pp 282–289
Astorino A, Fuduli A (2015) Support vector machine polyhedral separability in semi supervised learning. J Optim Theory Appl 164:1039–1050
Zhang Z, Zhao M, Chow TWS (2015) Graph based constrained semi-supervised learning framework via label propagation over adaptive neighborhood. IEEE Trans Knowl Data Eng 27:2362–2376
Subramanya A, Bilmes J (2011) Semi-supervised learning with measure propagation. J Mach Learn Res 12:3311–3370
Cecotti H (2016) Active graph based semi-supervised learning using image matching: application to handwritten digit recognition. Pattern Recognit Lett. 73:76–82
Patel H, Thakur GS (2016) A hybrid weighted nearest neighbor approach to mine imbalanced data. In: Proceeding 12th International Conference Data Mining (ICDM). IEEE, Las Vegas, pp 106–111
Lu J, Behbood V, Hao P, Zuo H, Xue S, Zhang G (2015) Transfer learning using computational intelligence: a survey. Knowl-Based Sys 80:14–23
Perlich C, Dalessandro B, Raeder T, Stitelman O, Provost F (2015) Machine learning for targeted display advertising: transfer learning in action. Mach Learn 95:103–127
Long M, Wang J, Ding G, Pan SJ, Yu PS (2014) Adaptation regularization: a general framework for transfer learning. IEEE Trans Knowl Data Eng 26:1076–1089
Wang B, Pineau J (2016) Online boosting algorithms for anytime transfer and multitask learning. In: Proceedings 29th AAAI Conference Artificial Intelligence, AAAI, Austin, pp 3038–3044
Kumar A, Khorwal R (2017) Firefly algorithm for feature selection in sentiment analysis. Computational intelligence in data mining. Springer, Singapore, pp 693–703
Nayak J, Naik B, Behera HS (2016) A novel nature inspired firefly algorithm with higher order neural network: performance analysis. Eng Sci Technol 19:197–211
Chakraborty B, Kawamura A (2018) A new penalty-based wrapper fitness function for feature subset selection with evolutionary algorithms. J Inf Telecommun 2:1–18. https://doi.org/10.1080/24751839.2018.1423792
La L, Cao S, Qin L (2018) Take full advantage of unlabeled data for sentiment classification. Kybernetes 47:474–486
Black PE (2005) Greedy algorithm. Dictionary of Algorithms and Data Structures. U.S, National Institute of Standards and Technology (NIST), Gaithersburg
Hazewinkel M (ed) (2001) [1994] Greedy algorithm. Encyclopedia of mathematics. Springer/Kluwer Academic Publishers, Dordrecht. ISBN 978-1-55608-010-4
Gers FA, Schmidhuber E (2001) LSTM recurrent networks learn simple context-free and context-sensitive languages. IEEE Trans Neural Netw 12:1333–1340. https://doi.org/10.1109/72.963769.ISSN1045-9227
Yang X-S, Sadat Hosseini SS, Gandomi AH (2012) Firefly algorithm for solving non-convex economic dispatch problems with valve loading effect. Appl Soft Comput 12:1180–1186
Kumar A, Mishra D (2013) Cat swarm based optimization of gene expression data classification. Int J Comp Trends Technol (IJCTT) 4:1185
Meysam O, Yasin O, Mohammad M, Mohammad T (2013) A novel cat swarm optimization algorithm for unconstrained optimization problems. Int J Inf Technol Comp Sci 11:32–41
Acknowledgements
This research was supported by King Saud University, Deanship of Scientific Research, Community College Research Unit.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Alarifi, A., Tolba, A., Al-Makhadmeh, Z. et al. A big data approach to sentiment analysis using greedy feature selection with cat swarm optimization-based long short-term memory neural networks. J Supercomput 76, 4414–4429 (2020). https://doi.org/10.1007/s11227-018-2398-2
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11227-018-2398-2