Skip to main content

3WS-ITSC: Three-Way Sampling on Imbalanced Text Data for Sentiment Classification

  • Conference paper
  • First Online:
Rough Sets (IJCRS 2022)

Abstract

Sentiment analysis is an important research direction of natural language processing. The data imbalance is a critical issue in text sentiment classification task. That arises the problem of high misclassification cost. This paper proposes a three-way sampling sentiment classification model for imbalanced text data to reduce the misclassification cost. Specifically, the model extracts boundary points through three-way sampling and collaborates with cost-sensitive learning for action on sampled results. Firstly, in order to reduce sampling time, the text data is converted into a one-dimensional vector by bag mapping. Secondly, three-way sampling is used to obtain boundary points that can characterize the majority class. Finally, a sequential three-way sentiment classification algorithm is used to predict sentiment polarity. The experimental results show that the proposed model outperforms state-of-the-art sentiment classification methods in the scenario of extremely imbalanced test data.

This work was supported by the National Natural Science Foundation of China (62006200); The Southwest Petroleum University Postgraduate English Course Construction Project (No. 2020QY04); Central Government Funds of Guiding Local Scientific and Technological Development (No. 2021ZYD0003).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 69.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 89.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    http://ai.stanford.edu/~amaas/data/sentiment/.

  2. 2.

    https://www.yelp.com/dataset.

  3. 3.

    http://snap.stanford.edu/data/web-Amazon-links.html.

References

  1. Abbasi-Moud, Z., Vahdat-Nejad, H., Sadri, J.: Tourism recommendation system based on semantic clustering and sentiment analysis. Expert Syst. Appl. 167, 114324 (2021)

    Article  Google Scholar 

  2. Chen, X., Zhang, W., Xu, X., Cao, W.: A public and large-scale expert information fusion method and its application: mining public opinion via sentiment analysis and measuring public dynamic reliability. Inf. Fusion 78, 71–85 (2022)

    Article  Google Scholar 

  3. Chen, Y.: Convolutional neural network for sentence classification. Master’s thesis, University of Waterloo (2015)

    Google Scholar 

  4. Chen, Y., Bi, J., Wang, J.Z.: Miles: multiple-instance learning via embedded instance selection. IEEE Trans. Pattern Anal. Mach. Intell. 28(12), 1931–1947 (2006)

    Article  Google Scholar 

  5. Chen, Z., Guo, W.: Text classification based on depth learning on unbalanced data. J. Chin. Comput. Syst. 41(1), 1–5 (2020)

    Google Scholar 

  6. Conjeti, S., Paschali, M., Katouzian, A., Navab, N.: Deep multiple instance hashing for scalable medical image retrieval. In: Descoteaux, M., Maier-Hein, L., Franz, A., Jannin, P., Collins, D.L., Duchesne, S. (eds.) MICCAI 2017. LNCS, vol. 10435, pp. 550–558. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-66179-7_63

    Chapter  Google Scholar 

  7. El Barachi, M., AlKhatib, M., Mathew, S., Oroumchian, F.: A novel sentiment analysis framework for monitoring the evolving public opinion in real-time: case study on climate change. J. Clean. Prod. 312, 127820 (2021)

    Article  Google Scholar 

  8. Fan, Q., Liu, D., Ye, X.Q.: Cost-sensitive text sentiment analysis based on sequential three-way decision. Pattern Recogn. Artif. Intell. 33(8), 732–742 (2020)

    Google Scholar 

  9. Fang, Y., Cao, X.M., Wang, X., Min, F.: Three-way sampling for rapid attribute reduction. Inf. Sci. 609, 26–45 (2022)

    Article  Google Scholar 

  10. Ghosh, K., Banerjee, A., Chatterjee, S., Sen, S.: Imbalanced Twitter sentiment analysis using minority oversampling. In: 2019 IEEE 10th International Conference on Awareness Science and Technology (iCAST), pp. 1–5. IEEE (2019)

    Google Scholar 

  11. Jiang, H., Wang, H., Hu, W., Kakde, D., Chaudhuri, A.: Fast incremental SVDD learning algorithm with the gaussian kernel. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 33, pp. 3991–3998 (2019)

    Google Scholar 

  12. Joulin, A., Grave, E., Bojanowski, P., Mikolov, T.: Bag of tricks for efficient text classification. arXiv preprint arXiv:1607.01759 (2016)

  13. Kübler, S., Liu, C., Sayyed, Z.A.: To use or not to use: feature selection for sentiment analysis of highly imbalanced data. Nat. Lang. Eng. 24(1), 3–37 (2018)

    Article  Google Scholar 

  14. Li, Y., Yin, C., Zhong, S.h., Pan, X.: Multi-instance multi-label learning networks for aspect-category sentiment analysis. arXiv preprint arXiv:2010.02656 (2020)

  15. Liu, G.H., Yang, J.Y., Li, Z.: Content-based image retrieval using computational visual attention model. Pattern Recogn. 48(8), 2554–2566 (2015)

    Google Scholar 

  16. Liu, P., Qiu, X., Huang, X.: Recurrent neural network for text classification with multi-task learning. arXiv preprint arXiv:1605.05101 (2016)

  17. Lutz, B., Pröllochs, N., Neumann, D.: Sentence-level sentiment analysis of financial news using distributed text representations and multi-instance learning. arXiv preprint arXiv:1901.00400 (2018)

  18. Sayyed, Z.A.: Study of sampling methods in sentiment analysis of imbalanced data. arXiv preprint arXiv:2106.06673 (2021)

  19. Tax, D.M., Duin, R.P.: Support vector data description. Mach. Learn. 54(1), 45–66 (2004)

    Article  MATH  Google Scholar 

  20. Wei, X.S., Wu, J., Zhou, Z.H.: Scalable algorithms for multi-instance learning. IEEE Trans. Neural Netw. Learn. Syst. 28(4), 975–987 (2016)

    Article  Google Scholar 

  21. Wei, X.S., Ye, H.J., Mu, X., Wu, J., Shen, C., Zhou, Z.H.: Multi-instance learning with emerging novel class. IEEE Trans. Knowl. Data Eng. 33(5), 2109–2120 (2019)

    Google Scholar 

  22. Yang, M., Zhang, Y.X., Wang, X., Min, F.: Multi-instance ensemble learning with discriminative bags. IEEE Trans. Syst. Man Cybern. Syst. 52(9), 5456–5467 (2021)

    Google Scholar 

  23. Yang, X., Li, Y., Li, Q., Liu, D., Li, T.: Temporal-spatial three-way granular computing for dynamic text sentiment classification. Inf. Sci. 596, 551–566 (2022)

    Article  Google Scholar 

  24. Yao, Y.: An outline of a theory of three-way decisions. In: Yao, J., et al. (eds.) RSCTC 2012. LNCS (LNAI), vol. 7413, pp. 1–17. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-32115-3_1

    Chapter  Google Scholar 

  25. Yao, Y.: Granular computing and sequential three-way decisions. In: Lingras, P., Wolski, M., Cornelis, C., Mitra, S., Wasilewski, P. (eds.) RSKT 2013. LNCS (LNAI), vol. 8171, pp. 16–27. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-41299-8_3

    Chapter  Google Scholar 

  26. Yao, Y.: Three-way granular computing, rough sets, and formal concept analysis. Int. J. Approximate Reason. 116, 106–125 (2020)

    Article  MathSciNet  MATH  Google Scholar 

  27. Zhang, Y., Zhang, Z., Miao, D., Wang, J.: Three-way enhanced convolutional neural networks for sentence-level sentiment classification. Inf. Sci. 477, 55–64 (2019)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Fan Min .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Fang, Y., Li, ZC., Yang, X., Min, F. (2022). 3WS-ITSC: Three-Way Sampling on Imbalanced Text Data for Sentiment Classification. In: Yao, J., Fujita, H., Yue, X., Miao, D., Grzymala-Busse, J., Li, F. (eds) Rough Sets. IJCRS 2022. Lecture Notes in Computer Science(), vol 13633. Springer, Cham. https://doi.org/10.1007/978-3-031-21244-4_30

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-21244-4_30

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-21243-7

  • Online ISBN: 978-3-031-21244-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics