Abstract
Sentiment analysis is an important research direction of natural language processing. The data imbalance is a critical issue in text sentiment classification task. That arises the problem of high misclassification cost. This paper proposes a three-way sampling sentiment classification model for imbalanced text data to reduce the misclassification cost. Specifically, the model extracts boundary points through three-way sampling and collaborates with cost-sensitive learning for action on sampled results. Firstly, in order to reduce sampling time, the text data is converted into a one-dimensional vector by bag mapping. Secondly, three-way sampling is used to obtain boundary points that can characterize the majority class. Finally, a sequential three-way sentiment classification algorithm is used to predict sentiment polarity. The experimental results show that the proposed model outperforms state-of-the-art sentiment classification methods in the scenario of extremely imbalanced test data.
This work was supported by the National Natural Science Foundation of China (62006200); The Southwest Petroleum University Postgraduate English Course Construction Project (No. 2020QY04); Central Government Funds of Guiding Local Scientific and Technological Development (No. 2021ZYD0003).
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Abbasi-Moud, Z., Vahdat-Nejad, H., Sadri, J.: Tourism recommendation system based on semantic clustering and sentiment analysis. Expert Syst. Appl. 167, 114324 (2021)
Chen, X., Zhang, W., Xu, X., Cao, W.: A public and large-scale expert information fusion method and its application: mining public opinion via sentiment analysis and measuring public dynamic reliability. Inf. Fusion 78, 71–85 (2022)
Chen, Y.: Convolutional neural network for sentence classification. Master’s thesis, University of Waterloo (2015)
Chen, Y., Bi, J., Wang, J.Z.: Miles: multiple-instance learning via embedded instance selection. IEEE Trans. Pattern Anal. Mach. Intell. 28(12), 1931–1947 (2006)
Chen, Z., Guo, W.: Text classification based on depth learning on unbalanced data. J. Chin. Comput. Syst. 41(1), 1–5 (2020)
Conjeti, S., Paschali, M., Katouzian, A., Navab, N.: Deep multiple instance hashing for scalable medical image retrieval. In: Descoteaux, M., Maier-Hein, L., Franz, A., Jannin, P., Collins, D.L., Duchesne, S. (eds.) MICCAI 2017. LNCS, vol. 10435, pp. 550–558. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-66179-7_63
El Barachi, M., AlKhatib, M., Mathew, S., Oroumchian, F.: A novel sentiment analysis framework for monitoring the evolving public opinion in real-time: case study on climate change. J. Clean. Prod. 312, 127820 (2021)
Fan, Q., Liu, D., Ye, X.Q.: Cost-sensitive text sentiment analysis based on sequential three-way decision. Pattern Recogn. Artif. Intell. 33(8), 732–742 (2020)
Fang, Y., Cao, X.M., Wang, X., Min, F.: Three-way sampling for rapid attribute reduction. Inf. Sci. 609, 26–45 (2022)
Ghosh, K., Banerjee, A., Chatterjee, S., Sen, S.: Imbalanced Twitter sentiment analysis using minority oversampling. In: 2019 IEEE 10th International Conference on Awareness Science and Technology (iCAST), pp. 1–5. IEEE (2019)
Jiang, H., Wang, H., Hu, W., Kakde, D., Chaudhuri, A.: Fast incremental SVDD learning algorithm with the gaussian kernel. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 33, pp. 3991–3998 (2019)
Joulin, A., Grave, E., Bojanowski, P., Mikolov, T.: Bag of tricks for efficient text classification. arXiv preprint arXiv:1607.01759 (2016)
Kübler, S., Liu, C., Sayyed, Z.A.: To use or not to use: feature selection for sentiment analysis of highly imbalanced data. Nat. Lang. Eng. 24(1), 3–37 (2018)
Li, Y., Yin, C., Zhong, S.h., Pan, X.: Multi-instance multi-label learning networks for aspect-category sentiment analysis. arXiv preprint arXiv:2010.02656 (2020)
Liu, G.H., Yang, J.Y., Li, Z.: Content-based image retrieval using computational visual attention model. Pattern Recogn. 48(8), 2554–2566 (2015)
Liu, P., Qiu, X., Huang, X.: Recurrent neural network for text classification with multi-task learning. arXiv preprint arXiv:1605.05101 (2016)
Lutz, B., Pröllochs, N., Neumann, D.: Sentence-level sentiment analysis of financial news using distributed text representations and multi-instance learning. arXiv preprint arXiv:1901.00400 (2018)
Sayyed, Z.A.: Study of sampling methods in sentiment analysis of imbalanced data. arXiv preprint arXiv:2106.06673 (2021)
Tax, D.M., Duin, R.P.: Support vector data description. Mach. Learn. 54(1), 45–66 (2004)
Wei, X.S., Wu, J., Zhou, Z.H.: Scalable algorithms for multi-instance learning. IEEE Trans. Neural Netw. Learn. Syst. 28(4), 975–987 (2016)
Wei, X.S., Ye, H.J., Mu, X., Wu, J., Shen, C., Zhou, Z.H.: Multi-instance learning with emerging novel class. IEEE Trans. Knowl. Data Eng. 33(5), 2109–2120 (2019)
Yang, M., Zhang, Y.X., Wang, X., Min, F.: Multi-instance ensemble learning with discriminative bags. IEEE Trans. Syst. Man Cybern. Syst. 52(9), 5456–5467 (2021)
Yang, X., Li, Y., Li, Q., Liu, D., Li, T.: Temporal-spatial three-way granular computing for dynamic text sentiment classification. Inf. Sci. 596, 551–566 (2022)
Yao, Y.: An outline of a theory of three-way decisions. In: Yao, J., et al. (eds.) RSCTC 2012. LNCS (LNAI), vol. 7413, pp. 1–17. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-32115-3_1
Yao, Y.: Granular computing and sequential three-way decisions. In: Lingras, P., Wolski, M., Cornelis, C., Mitra, S., Wasilewski, P. (eds.) RSKT 2013. LNCS (LNAI), vol. 8171, pp. 16–27. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-41299-8_3
Yao, Y.: Three-way granular computing, rough sets, and formal concept analysis. Int. J. Approximate Reason. 116, 106–125 (2020)
Zhang, Y., Zhang, Z., Miao, D., Wang, J.: Three-way enhanced convolutional neural networks for sentence-level sentiment classification. Inf. Sci. 477, 55–64 (2019)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Fang, Y., Li, ZC., Yang, X., Min, F. (2022). 3WS-ITSC: Three-Way Sampling on Imbalanced Text Data for Sentiment Classification. In: Yao, J., Fujita, H., Yue, X., Miao, D., Grzymala-Busse, J., Li, F. (eds) Rough Sets. IJCRS 2022. Lecture Notes in Computer Science(), vol 13633. Springer, Cham. https://doi.org/10.1007/978-3-031-21244-4_30
Download citation
DOI: https://doi.org/10.1007/978-3-031-21244-4_30
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-21243-7
Online ISBN: 978-3-031-21244-4
eBook Packages: Computer ScienceComputer Science (R0)