Abstract
In recent years, deep learning models (e.g. Convolutional Neural Networks (CNN) and Long Short-Term Memories (LSTM)), have been successfully applied to text sentiment analysis. However, the class-imbalance and unlabeled corpus still limit the accuracy of text sentiment classification. To overcome the two issues, in this work, we propose a new classification model named KSCB (integrating K-means++, SMOTE, CNN and Bi-LSTM models) for text sentiment analysis. The K-means++-SMOTE (combining K-means++ and SMOTE) operation in KSCB is firstly used to cluster sentiment text, and further generate new corpora via imbalance ratio to adjust data distribution. Then the loss function between K-means++-SMOTE and CNN-Bi-LSTM (combining CNN and Bi-LSTM) is applied to construct end-to-end learning. Different from other deep learning models, our proposed method KSCB can adjust data distribution for different sentiment corpora via KSCB optimization. We have applied KSCB into the balanced and imbalanced corpora, and the comparison results show that KSCB is better than or comparable to the other five state-of-the-art methods in text sentiment classification. Moreover, the ablation experiment in the balanced and imbalanced corpora have demonstrated the effectiveness of KSCB in text sentiment analysis.


Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
Sadr H, Pedram MM, Teshnehlab M (2019) A robust sentiment analysis method based on sequential combination of convolutional and recursive neural networks. Neural Process Lett, 50(6)
Zhang D, Zhu Z, Kang S, Zhang G, Liu P (2021) Syntactic and semantic analysis network for aspect-level sentiment classification. Appl Intell 51:6136–6147
Zhu Y, Zheng W, Tang H (2020) Interactive dual attention network for text sentiment classification. Computational intelligence and neuroscience 2020(3):1–11
Sharmin S, Chakma D (2021) Attention-based convolutional neural network for bangla sentiment analysis. AI & SOCIETY 36(1):381–396
Nassif AB, Elnagar A, Shahin I, Henno S (2020) Deep learning for arabic subjective sentiment analysis: Challenges and research opportunities. Appl Soft Comput 98:106836
Yue C, Cao H, Xu G, Dong Y (2021) Collaborative attention neural network for multi-domain sentiment classification. Appl Intell 51(6):3174–3188
Tong Z, Oles FJ (2001) Text categorization based on regularized linear classification methods. Inf Retr 4(1):5–31
Tan S, Cheng X, Ghanem MM, Wang B, Xu H (2005) A novel refinement approach for text categorization. In: Proceedings of the 2005 ACM CIKM International Conference on Information and Knowledge Management, Bremen, Germany, October 31 - November 5, 2005
Kazama J, Tsujii J (2005) Maximum entropy models with inequality constraints: A case study on text categorization. Mach Learn 60(1):159–194
Liu W, Song N (2003) A fuzzy approach to classification of text documents. J Comput Sci Technol 18(5):640–647
Wang R, Li Z, Cao J, Chen T, Wang L (2019) Convolutional recurrent neural networks for text classification. In: 2019 International joint conference on neural networks (IJCNN), IEEE, pp 1–6
Liu T, Yu S, Zhang H, Yin H (2018) Recurrent neural networks and convolutional neural networks for text classification. Computer engineering & software 39(01):64–69
Zhang X, Zhao J, LeCun Y (2015) Character-level convolutional networks for text classification. Advances in neural information processing systems 28:649–657
Meb A, Sn A, Ma B, Ec C, Ura D (2021) Abcdm: An attention-based bidirectional cnn-rnn deep model for sentiment analysis. Futur Gener Comput Syst 115:279–294
Ahmed NA, Shehab MA, Al-Ayyoub M, Hmeidi I (2015) Scalable multi-label arabic text classification. In: 2015 6th International conference on information and communication systems (ICICS), IEEE, pp 212–217
Minaee S, Kalchbrenner N, Cambria E, Nikzad N, Chenaghlu M, Gao J (2021) Deep learning–based text classification: A comprehensive review. ACM Computing Surveys (CSUR) 54(3):1–40
Agarwal M, Jaiswal R, Pal A (2015) k-means++ under approximation stability. Theor Comput Sci 588:37–51
Barua S, Islam MM, Murase K (2011) A novel synthetic minority oversampling technique for imbalanced data set learning. In: International conference on neural information processing, Springer, pp 735–744
Waibel A, Hanazawa T, Hinton G, Shikano K, Lang KJ (1989) Phoneme recognition using time-delay neural networks. IEEE Transactions on acoustics, speech, and signal processing 37(3):328–339
Hochreiter S, Schmidhuber J (1997) Long short-term memory. Neural Comput 9(8):1735–1780
Jain AK (2010) Data clustering: 50 years beyond k-means. Pattern Recogn Lett 31(8):651–666
Chawla NV, Japkowicz N, Kotcz A (2004) Special issue on learning from imbalanced data sets. ACM SIGKDD Explorations Newsletter 6(1):1–6
Quinlan JR (1991) Improved estimates for the accuracy of small disjuncts. Mach Learn 6(1):93–98
Zadrozny B, Elkan C (2001) Learning and making decisions when costs and probabilities are both unknown. In: Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining, pp 204–213
Lin Y, Lee Y, Wahba G (2002) Support vector machines for classification in nonstandard situations. Mach Learn 46(1):191–202
Jovic A, Bogunovic N (2012) Evaluating and comparing performance of feature combinations of heart rate variability measures for cardiac rhythm classification. Biomedical signal processing and control 7 (3):245–255
Turney P (2002) Thumbs up or thumbs down? Semantic orientation applied to unsupervised classification of reviews. Computing Research Repository - CORR, pp 417–424. https://doi.org/10.3115/1073083.1073153
Gamon M, Aue A (2005) Automatic identification of sentiment vocabulary: exploiting low association with known sentiment terms. In: Proceedings of the ACL workshop on feature engineering for machine learning in natural language processing, pp 57–64
Basant A, Namita M, Pooja B, Garg S (2015) Sentiment analysis using common-sense and context information. Hindawi Publishing Corporation Computational Intelligence and Neuroscience
Kowsari K, Jafari Meimandi K, Heidarysafa M, Mendu S, Barnes L, Brown D (2019) Text classification algorithms: A survey. Information 10(4):150
Zhang Y, Zheng J, Jiang Y, Huang G, Chen R (2019) A text sentiment classification modeling method based on coordinated cnn-lstm-attention model. Chin J Electron 28(1):120–126
Chen K, Tian L, Ding H, Cai M, Sun L, Liang S, Huo Q (2017) A compact cnn-dblstm based character model for online handwritten chinese text recognition. In: 2017 14th IAPR international conference on document analysis and Recognition (ICDAR), vol 1, IEEE, pp 1068–1073
Liang D, Zhang Y (2016) AC-BLSTM: asymmetric convolutional bidirectional lstm networks for text classification. arXiv:1611.01884
Wang J, Yu L-C, Lai KR, Zhang X (2016) Dimensional sentiment analysis using a regional cnn-lstm model. In: Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), vol 2, pp 225– 230
Hartigan JA, Wong MA (1979) Algorithm as 136: A k-means clustering algorithm. J R Stat Soc: Ser C: Appl Stat 28(1):100–108
Rumelhart DE, Hinton GE, Williams RJ (1985) Learning internal representations by error propagation. Tech. Report, California Univ San Diego La Jolla Inst for Cognitive Science
Schuster M, Paliwal KK (1997) Bidirectional recurrent neural networks. IEEE Transactions on signal processing 45(11):2673–2681
Douzas G, Bacao F, Last F (2018) Improving imbalanced learning through a heuristic oversampling method based on k-means and smote. Inf Sci 465:1–20
Hastie T, Taylor J, Tibshirani R, Walther G (2007) Forward stagewise regression and the monotone lasso. Electronic journal of statistics 1:1–29
Du G, Zhang J, Jiang M, Long J, Lin Y, Li S, Tan KC (2021) Graph-based class-imbalance learning with label enhancement. IEEE Trans. Neural Networks Learn. Syst., early access. https://doi.org/10.1109/TNNLS.2021.3133262
Jiang W, Chen Z, Xiang Y, Shao D, Ma L, Zhang J (2019) SSEM: a novel self-adaptive stacking ensemble model for classification. IEEE Access 7:120337–120349
Sabour S, Frosst N, Hinton GE (2017) Dynamic routing between capsules. 31st Annual conference on neural information processing systems (NIPS), 30
Zimmerman DW, Zumbo BD (1993) Relative power of the wilcoxon test, the friedman test, and repeated-measures anova on ranks. J Exp Educ 62(1):75–86
Pedregosa F, Varoquaux G, Gramfort A, Michel V, Thirion B, Grisel O, Blondel M, Prettenhofer P, Weiss R, Dubourg V et al (2011) Scikit-learn: Machine learning in python. J Mach Learn Res 12:2825–2830
Acknowledgements
This work has been supported by the Yunnan Fundamental Research Projects (202001AT070024).
Author information
Authors and Affiliations
Corresponding authors
Ethics declarations
Conflict of Interests
The authors declare that they have no conflict of interest.
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Jiang, W., Zhou, K., Xiong, C. et al. KSCB: a novel unsupervised method for text sentiment analysis. Appl Intell 53, 301–311 (2023). https://doi.org/10.1007/s10489-022-03389-4
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10489-022-03389-4