Abstract
Sentiment analysis has become a very popular research topic, especially for retrieving valuable information from various online environments. Most existing sentiment studies are based on supervised learning, which requires sufficient amount of labeled data. However, sentiment analysis often faces insufficient labeled data in practice, as it is very expensive and time-consuming to label large amount of data. To handle the scenario of insufficient initial labeled data, we propose a novel semi-supervised model based on dynamic threshold and multi-classifiers. In particular, the training data are auto-labeled in an iterative way based on the proposed dynamic threshold algorithm, where a dynamic threshold function is proposed to set thresholds for selecting the auto-labeled data. It considers both the quality and quantity of the auto-labeled data. In addition, the proposed weighted voting strategy combines multiple support vector machine classifiers by considering performance gap among different classifiers. The performance of the proposed model is validated through experiments on real datasets. Compared with two other existing models, the proposed model achieves the highest sentiment analysis accuracy across datasets with different sizes of initial labeled training data.






Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Notes
The dataset can be downloaded from the Web site: http://ai.stanford.edu/ amaas/data/ sentiment/.
References
Nagarajan SM, Gandhi UD (2018) Classifying streaming of twitter data based on sentiment analysis using hybridization. Neural Comput Appl 4:1–9
Valdivia A, Luzn MV, Herrera F (2017) Sentiment analysis in TripAdvisor. IEEE Intell Syst 32(4):72–77
Lei X, Qian X, Zhao G (2016) Rating prediction based on social sentiment from textual reviews. IEEE Trans Multimed 18(9):1910–1921
Cao J, Zeng K, Wang H (2014) Web-based traffic sentiment analysis: methods and applications. IEEE Trans Intell Transp Syst 15(2):844–853
Lu Y, Rao Y, Yang J, Yin J (2018) Incorporating Lexicons into LSTM for sentiment classification. In: 2018 International joint conference on neural networks (IJCNN), pp 1–7
Chen Y, Zhang Z (2018) Research on text sentiment analysis based on CNNs and SVM. In: 13th IEEE conference on industrial electronics and applications (ICIEA), pp 2731–2734
Yenter A, Verma A (2017) Deep CNN-LSTM with combined kernels from multiple branches for IMDB review sentiment analysis. In: IEEE 8th annual ubiquitous computing, electronics and mobile communication conference (UEMCON), pp 540–546
Zhou S, Chen Q, Wang X (2013) Active deep learning method for semisupervised sentiment classification. Neurocomputing 120(10):536–546
Hussain A, Cambria E (2018) Semi-supervised learning for big social data analysis. Neurocomputing 275:1662–1673
Rout J, Dalmia A, Choo KKR, Bakshi S, Jena S (2017) Revisiting semisupervised learning for online deceptive review detection. IEEE Access 99:1–1
Fung G, Mangasarian OL (2001) Semi-supervised support vector machines for unlabeled data classification. Optim Methods Softw 15(1):29–44
Zhang H, Liu G, Chow TWS (2011) Textual and visual content-based anti-phishing: a Bayesian approach. IEEE Trans Neural Netw 22(10):1532–1546
Hong S, Lee J, Lee JH (2014) Competitive self-training technique for sentiment analysis in mass social media. In: International symposium on soft computing and intelligent systems, pp 9–12
Huang W, Fan L (2016) Semi-supervised sentiment classification based on ensemble learning with voting. J Chin Inf Process 2:41–49
Nasukawa T, Yi J (2003) Sentiment analysis: capturing favorability using natural language processing. In: International conference on knowledge capture, pp 70–77
Atarashi K, Oyama S, Kurihara M (2018) Semi-supervised learning from crowds using deep generative models. In: Proceedings of the thirty-second AAAI conference on artificial intelligence (AAAI-18). AAAI
Blum A (1998) Combining labeled and unlabeled data with co-training. In: Conference on computational learning theory, pp 92–100
Maeireizo B, Litman D, Hwa R (2004) Co-training for predicting emotions with spoken dialogue data. In: ACL 2004 on interactive poster and demonstration sessions, p 28
Li T, Zhang Y, Sindhwani V (2009) A non-negative matrix tri-factorization approach to sentiment classification with lexical prior knowledge. In: Joint conference of the meeting of the ACL and the international joint conference on natural language processing of the AFNLP, vol, pp 244–252
Sindhwani V, Melville P (2008) Document-word co-regularization for semisupervised sentiment analysis, pp 1025–1030
He Y, Zhou D (2011) Self-training from labeled features for sentiment analysis. Inf Process Manag 47(4):606–616
Nora BM, Lemnaru C, Potolea R (2010) Semi-supervised learning with lexical knowledge for opinion mining. IEEE Computer Society
Lu TJ (2015) Semi-supervised microblog sentiment analysis using social relation and text similarity. In: International conference on big data and smart computing, pp 194–201
Sadhana SA, Sairamesh L, Sabena S, Ganapathy S, Kannan A (2017) Mining target opinions from online reviews using semi-supervised word alignment model. In: Second international conference on recent trends and challenges in computational models, pp 196–200
Hajmohammadi MS, Ibrahim R, Selamat A (2015) Graph-based semisupervised learning for cross-lingual sentiment classification. Springer, Berlin
Zhu S, Xu B, Zheng D, Zhao T (2013) Chinese microblog sentiment analysis based on semi-supervised learning. Springer, New York
Aghababaei S, Makrehchi M (2017) Interpolative self-training approach for sentiment analysis. In: International conference on behavioral, economic and socio-cultural computing, pp 1–6
Shi H, Li X, Liu H, Zhu L (2016) Research on the attribute classification of sentiment target based on the stratified sampling. In: International conference on natural computation, fuzzy systems and knowledge discovery, pp 1180–1187
Dai L, Chen H, Li X (2011) Improving sentiment classification using feature highlighting and feature bagging. In: IEEE international conference on data mining workshops, pp 61–66
Rong W, Nie Y, Ouyang Y, Peng B, Xiong Z (2014) Auto-encoder based bagging architecture for sentiment analysis. J Vis Lang Comput 25(6):840–849
Prusa J, Khoshgoftaar TM, Dittman DJ (2015) Using ensemble learners to improve classifier performance on tweet sentiment data. In: IEEE international conference on information reuse and integration, pp 252–257
Acknowledgements
This work was supported by National Nature Science Foundation of China (NSFC) under Project 71502125.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Han, Y., Liu, Y. & Jin, Z. Sentiment analysis via semi-supervised learning: a model based on dynamic threshold and multi-classifiers. Neural Comput & Applic 32, 5117–5129 (2020). https://doi.org/10.1007/s00521-018-3958-3
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00521-018-3958-3