Skip to main content
Log in

KSCB: a novel unsupervised method for text sentiment analysis

  • Published:
Applied Intelligence Aims and scope Submit manuscript

Abstract

In recent years, deep learning models (e.g. Convolutional Neural Networks (CNN) and Long Short-Term Memories (LSTM)), have been successfully applied to text sentiment analysis. However, the class-imbalance and unlabeled corpus still limit the accuracy of text sentiment classification. To overcome the two issues, in this work, we propose a new classification model named KSCB (integrating K-means++, SMOTE, CNN and Bi-LSTM models) for text sentiment analysis. The K-means++-SMOTE (combining K-means++ and SMOTE) operation in KSCB is firstly used to cluster sentiment text, and further generate new corpora via imbalance ratio to adjust data distribution. Then the loss function between K-means++-SMOTE and CNN-Bi-LSTM (combining CNN and Bi-LSTM) is applied to construct end-to-end learning. Different from other deep learning models, our proposed method KSCB can adjust data distribution for different sentiment corpora via KSCB optimization. We have applied KSCB into the balanced and imbalanced corpora, and the comparison results show that KSCB is better than or comparable to the other five state-of-the-art methods in text sentiment classification. Moreover, the ablation experiment in the balanced and imbalanced corpora have demonstrated the effectiveness of KSCB in text sentiment analysis.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2

Similar content being viewed by others

Explore related subjects

Discover the latest articles, news and stories from top researchers in related subjects.

References

  1. Sadr H, Pedram MM, Teshnehlab M (2019) A robust sentiment analysis method based on sequential combination of convolutional and recursive neural networks. Neural Process Lett, 50(6)

  2. Zhang D, Zhu Z, Kang S, Zhang G, Liu P (2021) Syntactic and semantic analysis network for aspect-level sentiment classification. Appl Intell 51:6136–6147

    Article  Google Scholar 

  3. Zhu Y, Zheng W, Tang H (2020) Interactive dual attention network for text sentiment classification. Computational intelligence and neuroscience 2020(3):1–11

    Google Scholar 

  4. Sharmin S, Chakma D (2021) Attention-based convolutional neural network for bangla sentiment analysis. AI & SOCIETY 36(1):381–396

    Article  Google Scholar 

  5. Nassif AB, Elnagar A, Shahin I, Henno S (2020) Deep learning for arabic subjective sentiment analysis: Challenges and research opportunities. Appl Soft Comput 98:106836

    Article  Google Scholar 

  6. Yue C, Cao H, Xu G, Dong Y (2021) Collaborative attention neural network for multi-domain sentiment classification. Appl Intell 51(6):3174–3188

    Article  Google Scholar 

  7. Tong Z, Oles FJ (2001) Text categorization based on regularized linear classification methods. Inf Retr 4(1):5–31

    Article  MATH  Google Scholar 

  8. Tan S, Cheng X, Ghanem MM, Wang B, Xu H (2005) A novel refinement approach for text categorization. In: Proceedings of the 2005 ACM CIKM International Conference on Information and Knowledge Management, Bremen, Germany, October 31 - November 5, 2005

  9. Kazama J, Tsujii J (2005) Maximum entropy models with inequality constraints: A case study on text categorization. Mach Learn 60(1):159–194

    Article  MATH  Google Scholar 

  10. Liu W, Song N (2003) A fuzzy approach to classification of text documents. J Comput Sci Technol 18(5):640–647

    Article  MathSciNet  MATH  Google Scholar 

  11. Wang R, Li Z, Cao J, Chen T, Wang L (2019) Convolutional recurrent neural networks for text classification. In: 2019 International joint conference on neural networks (IJCNN), IEEE, pp 1–6

  12. Liu T, Yu S, Zhang H, Yin H (2018) Recurrent neural networks and convolutional neural networks for text classification. Computer engineering & software 39(01):64–69

    Google Scholar 

  13. Zhang X, Zhao J, LeCun Y (2015) Character-level convolutional networks for text classification. Advances in neural information processing systems 28:649–657

    Google Scholar 

  14. Meb A, Sn A, Ma B, Ec C, Ura D (2021) Abcdm: An attention-based bidirectional cnn-rnn deep model for sentiment analysis. Futur Gener Comput Syst 115:279–294

    Article  Google Scholar 

  15. Ahmed NA, Shehab MA, Al-Ayyoub M, Hmeidi I (2015) Scalable multi-label arabic text classification. In: 2015 6th International conference on information and communication systems (ICICS), IEEE, pp 212–217

  16. Minaee S, Kalchbrenner N, Cambria E, Nikzad N, Chenaghlu M, Gao J (2021) Deep learning–based text classification: A comprehensive review. ACM Computing Surveys (CSUR) 54(3):1–40

    Article  Google Scholar 

  17. Agarwal M, Jaiswal R, Pal A (2015) k-means++ under approximation stability. Theor Comput Sci 588:37–51

    Article  MathSciNet  MATH  Google Scholar 

  18. Barua S, Islam MM, Murase K (2011) A novel synthetic minority oversampling technique for imbalanced data set learning. In: International conference on neural information processing, Springer, pp 735–744

  19. Waibel A, Hanazawa T, Hinton G, Shikano K, Lang KJ (1989) Phoneme recognition using time-delay neural networks. IEEE Transactions on acoustics, speech, and signal processing 37(3):328–339

    Article  Google Scholar 

  20. Hochreiter S, Schmidhuber J (1997) Long short-term memory. Neural Comput 9(8):1735–1780

    Article  Google Scholar 

  21. Jain AK (2010) Data clustering: 50 years beyond k-means. Pattern Recogn Lett 31(8):651–666

    Article  Google Scholar 

  22. Chawla NV, Japkowicz N, Kotcz A (2004) Special issue on learning from imbalanced data sets. ACM SIGKDD Explorations Newsletter 6(1):1–6

    Article  Google Scholar 

  23. Quinlan JR (1991) Improved estimates for the accuracy of small disjuncts. Mach Learn 6(1):93–98

    Article  Google Scholar 

  24. Zadrozny B, Elkan C (2001) Learning and making decisions when costs and probabilities are both unknown. In: Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining, pp 204–213

  25. Lin Y, Lee Y, Wahba G (2002) Support vector machines for classification in nonstandard situations. Mach Learn 46(1):191–202

    Article  MATH  Google Scholar 

  26. Jovic A, Bogunovic N (2012) Evaluating and comparing performance of feature combinations of heart rate variability measures for cardiac rhythm classification. Biomedical signal processing and control 7 (3):245–255

    Article  Google Scholar 

  27. Turney P (2002) Thumbs up or thumbs down? Semantic orientation applied to unsupervised classification of reviews. Computing Research Repository - CORR, pp 417–424. https://doi.org/10.3115/1073083.1073153

  28. Gamon M, Aue A (2005) Automatic identification of sentiment vocabulary: exploiting low association with known sentiment terms. In: Proceedings of the ACL workshop on feature engineering for machine learning in natural language processing, pp 57–64

  29. Basant A, Namita M, Pooja B, Garg S (2015) Sentiment analysis using common-sense and context information. Hindawi Publishing Corporation Computational Intelligence and Neuroscience

  30. Kowsari K, Jafari Meimandi K, Heidarysafa M, Mendu S, Barnes L, Brown D (2019) Text classification algorithms: A survey. Information 10(4):150

    Article  Google Scholar 

  31. Zhang Y, Zheng J, Jiang Y, Huang G, Chen R (2019) A text sentiment classification modeling method based on coordinated cnn-lstm-attention model. Chin J Electron 28(1):120–126

    Article  Google Scholar 

  32. Chen K, Tian L, Ding H, Cai M, Sun L, Liang S, Huo Q (2017) A compact cnn-dblstm based character model for online handwritten chinese text recognition. In: 2017 14th IAPR international conference on document analysis and Recognition (ICDAR), vol 1, IEEE, pp 1068–1073

  33. Liang D, Zhang Y (2016) AC-BLSTM: asymmetric convolutional bidirectional lstm networks for text classification. arXiv:1611.01884

  34. Wang J, Yu L-C, Lai KR, Zhang X (2016) Dimensional sentiment analysis using a regional cnn-lstm model. In: Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), vol 2, pp 225– 230

  35. Hartigan JA, Wong MA (1979) Algorithm as 136: A k-means clustering algorithm. J R Stat Soc: Ser C: Appl Stat 28(1):100–108

    MATH  Google Scholar 

  36. Rumelhart DE, Hinton GE, Williams RJ (1985) Learning internal representations by error propagation. Tech. Report, California Univ San Diego La Jolla Inst for Cognitive Science

  37. Schuster M, Paliwal KK (1997) Bidirectional recurrent neural networks. IEEE Transactions on signal processing 45(11):2673–2681

    Article  Google Scholar 

  38. Douzas G, Bacao F, Last F (2018) Improving imbalanced learning through a heuristic oversampling method based on k-means and smote. Inf Sci 465:1–20

    Article  Google Scholar 

  39. Hastie T, Taylor J, Tibshirani R, Walther G (2007) Forward stagewise regression and the monotone lasso. Electronic journal of statistics 1:1–29

    Article  MathSciNet  MATH  Google Scholar 

  40. Du G, Zhang J, Jiang M, Long J, Lin Y, Li S, Tan KC (2021) Graph-based class-imbalance learning with label enhancement. IEEE Trans. Neural Networks Learn. Syst., early access. https://doi.org/10.1109/TNNLS.2021.3133262

  41. Jiang W, Chen Z, Xiang Y, Shao D, Ma L, Zhang J (2019) SSEM: a novel self-adaptive stacking ensemble model for classification. IEEE Access 7:120337–120349

    Article  Google Scholar 

  42. Sabour S, Frosst N, Hinton GE (2017) Dynamic routing between capsules. 31st Annual conference on neural information processing systems (NIPS), 30

  43. Zimmerman DW, Zumbo BD (1993) Relative power of the wilcoxon test, the friedman test, and repeated-measures anova on ranks. J Exp Educ 62(1):75–86

    Article  Google Scholar 

  44. Pedregosa F, Varoquaux G, Gramfort A, Michel V, Thirion B, Grisel O, Blondel M, Prettenhofer P, Weiss R, Dubourg V et al (2011) Scikit-learn: Machine learning in python. J Mach Learn Res 12:2825–2830

    MathSciNet  MATH  Google Scholar 

Download references

Acknowledgements

This work has been supported by the Yunnan Fundamental Research Projects (202001AT070024).

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to Chubin Ou or Junpeng Zhang.

Ethics declarations

Conflict of Interests

The authors declare that they have no conflict of interest.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Jiang, W., Zhou, K., Xiong, C. et al. KSCB: a novel unsupervised method for text sentiment analysis. Appl Intell 53, 301–311 (2023). https://doi.org/10.1007/s10489-022-03389-4

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10489-022-03389-4

Keywords