KSCB: a novel unsupervised method for text sentiment analysis

Jiang, Weili; Zhou, Kangneng; Xiong, Chenchen; Du, Guodong; Ou, Chubin; Zhang, Junpeng

doi:10.1007/s10489-022-03389-4

KSCB: a novel unsupervised method for text sentiment analysis

Published: 15 April 2022

Volume 53, pages 301–311, (2023)
Cite this article

Applied Intelligence Aims and scope Submit manuscript

Weili Jiang¹,
Kangneng Zhou²,
Chenchen Xiong³,
Guodong Du⁴,
Chubin Ou^5,6 &
…
Junpeng Zhang⁷

1552 Accesses
1 Altmetric
Explore all metrics

Abstract

In recent years, deep learning models (e.g. Convolutional Neural Networks (CNN) and Long Short-Term Memories (LSTM)), have been successfully applied to text sentiment analysis. However, the class-imbalance and unlabeled corpus still limit the accuracy of text sentiment classification. To overcome the two issues, in this work, we propose a new classification model named KSCB (integrating K-means++, SMOTE, CNN and Bi-LSTM models) for text sentiment analysis. The K-means++-SMOTE (combining K-means++ and SMOTE) operation in KSCB is firstly used to cluster sentiment text, and further generate new corpora via imbalance ratio to adjust data distribution. Then the loss function between K-means++-SMOTE and CNN-Bi-LSTM (combining CNN and Bi-LSTM) is applied to construct end-to-end learning. Different from other deep learning models, our proposed method KSCB can adjust data distribution for different sentiment corpora via KSCB optimization. We have applied KSCB into the balanced and imbalanced corpora, and the comparison results show that KSCB is better than or comparable to the other five state-of-the-art methods in text sentiment classification. Moreover, the ablation experiment in the balanced and imbalanced corpora have demonstrated the effectiveness of KSCB in text sentiment analysis.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Text sentiment analysis based on CBOW model and deep learning in big data environment

Article 22 October 2018

Improving Multi-class Text Classification Using Balancing Techniques

Context-sensitive lexicon for imbalanced text sentiment classification using bidirectional LSTM

Article 10 November 2021

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

References

Sadr H, Pedram MM, Teshnehlab M (2019) A robust sentiment analysis method based on sequential combination of convolutional and recursive neural networks. Neural Process Lett, 50(6)
Zhang D, Zhu Z, Kang S, Zhang G, Liu P (2021) Syntactic and semantic analysis network for aspect-level sentiment classification. Appl Intell 51:6136–6147
Article Google Scholar
Zhu Y, Zheng W, Tang H (2020) Interactive dual attention network for text sentiment classification. Computational intelligence and neuroscience 2020(3):1–11
Google Scholar
Sharmin S, Chakma D (2021) Attention-based convolutional neural network for bangla sentiment analysis. AI & SOCIETY 36(1):381–396
Article Google Scholar
Nassif AB, Elnagar A, Shahin I, Henno S (2020) Deep learning for arabic subjective sentiment analysis: Challenges and research opportunities. Appl Soft Comput 98:106836
Article Google Scholar
Yue C, Cao H, Xu G, Dong Y (2021) Collaborative attention neural network for multi-domain sentiment classification. Appl Intell 51(6):3174–3188
Article Google Scholar
Tong Z, Oles FJ (2001) Text categorization based on regularized linear classification methods. Inf Retr 4(1):5–31
Article MATH Google Scholar
Tan S, Cheng X, Ghanem MM, Wang B, Xu H (2005) A novel refinement approach for text categorization. In: Proceedings of the 2005 ACM CIKM International Conference on Information and Knowledge Management, Bremen, Germany, October 31 - November 5, 2005
Kazama J, Tsujii J (2005) Maximum entropy models with inequality constraints: A case study on text categorization. Mach Learn 60(1):159–194
Article MATH Google Scholar
Liu W, Song N (2003) A fuzzy approach to classification of text documents. J Comput Sci Technol 18(5):640–647
Article MathSciNet MATH Google Scholar
Wang R, Li Z, Cao J, Chen T, Wang L (2019) Convolutional recurrent neural networks for text classification. In: 2019 International joint conference on neural networks (IJCNN), IEEE, pp 1–6
Liu T, Yu S, Zhang H, Yin H (2018) Recurrent neural networks and convolutional neural networks for text classification. Computer engineering & software 39(01):64–69
Google Scholar
Zhang X, Zhao J, LeCun Y (2015) Character-level convolutional networks for text classification. Advances in neural information processing systems 28:649–657
Google Scholar
Meb A, Sn A, Ma B, Ec C, Ura D (2021) Abcdm: An attention-based bidirectional cnn-rnn deep model for sentiment analysis. Futur Gener Comput Syst 115:279–294
Article Google Scholar
Ahmed NA, Shehab MA, Al-Ayyoub M, Hmeidi I (2015) Scalable multi-label arabic text classification. In: 2015 6th International conference on information and communication systems (ICICS), IEEE, pp 212–217
Minaee S, Kalchbrenner N, Cambria E, Nikzad N, Chenaghlu M, Gao J (2021) Deep learning–based text classification: A comprehensive review. ACM Computing Surveys (CSUR) 54(3):1–40
Article Google Scholar
Agarwal M, Jaiswal R, Pal A (2015) k-means++ under approximation stability. Theor Comput Sci 588:37–51
Article MathSciNet MATH Google Scholar
Barua S, Islam MM, Murase K (2011) A novel synthetic minority oversampling technique for imbalanced data set learning. In: International conference on neural information processing, Springer, pp 735–744
Waibel A, Hanazawa T, Hinton G, Shikano K, Lang KJ (1989) Phoneme recognition using time-delay neural networks. IEEE Transactions on acoustics, speech, and signal processing 37(3):328–339
Article Google Scholar
Hochreiter S, Schmidhuber J (1997) Long short-term memory. Neural Comput 9(8):1735–1780
Article Google Scholar
Jain AK (2010) Data clustering: 50 years beyond k-means. Pattern Recogn Lett 31(8):651–666
Article Google Scholar
Chawla NV, Japkowicz N, Kotcz A (2004) Special issue on learning from imbalanced data sets. ACM SIGKDD Explorations Newsletter 6(1):1–6
Article Google Scholar
Quinlan JR (1991) Improved estimates for the accuracy of small disjuncts. Mach Learn 6(1):93–98
Article Google Scholar
Zadrozny B, Elkan C (2001) Learning and making decisions when costs and probabilities are both unknown. In: Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining, pp 204–213
Lin Y, Lee Y, Wahba G (2002) Support vector machines for classification in nonstandard situations. Mach Learn 46(1):191–202
Article MATH Google Scholar
Jovic A, Bogunovic N (2012) Evaluating and comparing performance of feature combinations of heart rate variability measures for cardiac rhythm classification. Biomedical signal processing and control 7 (3):245–255
Article Google Scholar
Turney P (2002) Thumbs up or thumbs down? Semantic orientation applied to unsupervised classification of reviews. Computing Research Repository - CORR, pp 417–424. https://doi.org/10.3115/1073083.1073153
Gamon M, Aue A (2005) Automatic identification of sentiment vocabulary: exploiting low association with known sentiment terms. In: Proceedings of the ACL workshop on feature engineering for machine learning in natural language processing, pp 57–64
Basant A, Namita M, Pooja B, Garg S (2015) Sentiment analysis using common-sense and context information. Hindawi Publishing Corporation Computational Intelligence and Neuroscience
Kowsari K, Jafari Meimandi K, Heidarysafa M, Mendu S, Barnes L, Brown D (2019) Text classification algorithms: A survey. Information 10(4):150
Article Google Scholar
Zhang Y, Zheng J, Jiang Y, Huang G, Chen R (2019) A text sentiment classification modeling method based on coordinated cnn-lstm-attention model. Chin J Electron 28(1):120–126
Article Google Scholar
Chen K, Tian L, Ding H, Cai M, Sun L, Liang S, Huo Q (2017) A compact cnn-dblstm based character model for online handwritten chinese text recognition. In: 2017 14th IAPR international conference on document analysis and Recognition (ICDAR), vol 1, IEEE, pp 1068–1073
Liang D, Zhang Y (2016) AC-BLSTM: asymmetric convolutional bidirectional lstm networks for text classification. arXiv:1611.01884
Wang J, Yu L-C, Lai KR, Zhang X (2016) Dimensional sentiment analysis using a regional cnn-lstm model. In: Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), vol 2, pp 225– 230
Hartigan JA, Wong MA (1979) Algorithm as 136: A k-means clustering algorithm. J R Stat Soc: Ser C: Appl Stat 28(1):100–108
MATH Google Scholar
Rumelhart DE, Hinton GE, Williams RJ (1985) Learning internal representations by error propagation. Tech. Report, California Univ San Diego La Jolla Inst for Cognitive Science
Schuster M, Paliwal KK (1997) Bidirectional recurrent neural networks. IEEE Transactions on signal processing 45(11):2673–2681
Article Google Scholar
Douzas G, Bacao F, Last F (2018) Improving imbalanced learning through a heuristic oversampling method based on k-means and smote. Inf Sci 465:1–20
Article Google Scholar
Hastie T, Taylor J, Tibshirani R, Walther G (2007) Forward stagewise regression and the monotone lasso. Electronic journal of statistics 1:1–29
Article MathSciNet MATH Google Scholar
Du G, Zhang J, Jiang M, Long J, Lin Y, Li S, Tan KC (2021) Graph-based class-imbalance learning with label enhancement. IEEE Trans. Neural Networks Learn. Syst., early access. https://doi.org/10.1109/TNNLS.2021.3133262
Jiang W, Chen Z, Xiang Y, Shao D, Ma L, Zhang J (2019) SSEM: a novel self-adaptive stacking ensemble model for classification. IEEE Access 7:120337–120349
Article Google Scholar
Sabour S, Frosst N, Hinton GE (2017) Dynamic routing between capsules. 31st Annual conference on neural information processing systems (NIPS), 30
Zimmerman DW, Zumbo BD (1993) Relative power of the wilcoxon test, the friedman test, and repeated-measures anova on ranks. J Exp Educ 62(1):75–86
Article Google Scholar
Pedregosa F, Varoquaux G, Gramfort A, Michel V, Thirion B, Grisel O, Blondel M, Prettenhofer P, Weiss R, Dubourg V et al (2011) Scikit-learn: Machine learning in python. J Mach Learn Res 12:2825–2830
MathSciNet MATH Google Scholar

Download references

Acknowledgements

This work has been supported by the Yunnan Fundamental Research Projects (202001AT070024).

Author information

Authors and Affiliations

College of Computer Science, Sichuan University, Chengdu, China
Weili Jiang
School of Computer and Communication Engineering, University of Science and Technology Beijing, Beijing, China
Kangneng Zhou
National Engineering Research Center for Beijing Biochip Technology, Beijing, China
Chenchen Xiong
Department of Artificial Intelligence, Xiamen University, Xiamen, China
Guodong Du
Faculty of Medicine, Health And Human Sciences, Macquarie University, New South Wales, Australia
Chubin Ou
Weizhi Meditech (Foshan) Co., Ltd, Foshan, Guangdong Province, 528200, China
Chubin Ou
School of Engineering, Dali University, Dali, China
Junpeng Zhang

Authors

Weili Jiang
View author publications
You can also search for this author inPubMed Google Scholar
Kangneng Zhou
View author publications
You can also search for this author inPubMed Google Scholar
Chenchen Xiong
View author publications
You can also search for this author inPubMed Google Scholar
Guodong Du
View author publications
You can also search for this author inPubMed Google Scholar
Chubin Ou
View author publications
You can also search for this author inPubMed Google Scholar
Junpeng Zhang
View author publications
You can also search for this author inPubMed Google Scholar

Corresponding authors

Correspondence to Chubin Ou or Junpeng Zhang.

Ethics declarations

Conflict of Interests

The authors declare that they have no conflict of interest.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Jiang, W., Zhou, K., Xiong, C. et al. KSCB: a novel unsupervised method for text sentiment analysis. Appl Intell 53, 301–311 (2023). https://doi.org/10.1007/s10489-022-03389-4

Download citation

Accepted: 14 February 2022
Published: 15 April 2022
Issue Date: January 2023
DOI: https://doi.org/10.1007/s10489-022-03389-4

Keywords

K-means++, SMOTE, CNN, Bi-LSTM, End-to-end Learning, Text Sentiment Classification

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

KSCB: a novel unsupervised method for text sentiment analysis

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Text sentiment analysis based on CBOW model and deep learning in big data environment

Improving Multi-class Text Classification Using Balancing Techniques

Context-sensitive lexicon for imbalanced text sentiment classification using bidirectional LSTM

Explore related subjects

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding authors

Ethics declarations

Conflict of Interests

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now