BSIL: A Brain Storm-Based Framework for Imbalanced Text Classification

Tian, Jiachen; Chen, Shizhan; Zhang, Xiaowang; Feng, Zhiyong

doi:10.1007/978-3-030-32236-6_5

Jiachen Tian^13,14,
Shizhan Chen^13,14,
Xiaowang Zhang^13,14 &
…
Zhiyong Feng^13,14

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 11839))

Included in the following conference series:

CCF International Conference on Natural Language Processing and Chinese Computing

4675 Accesses
2 Citations

Abstract

All neural networks are not always effective in processing imbalanced datasets when dealing with text classification due to most of them designed under a balanced assumption. In this paper, we present a novel framework named BSIL to improve the capability of neural networks in imbalanced text classification built on brain storm optimization (BSO). With our framework BSIL, the simulation of human brainstorming process of BSO can sample imbalanced datasets in a reasonable way. Firstly, we present an approach to generate multiple relatively balanced subsets of an imbalanced dataset by applying scrambling segmentation and global random sampling in BSIL. Secondly, we introduce a parallel method to train a classifier for a subset efficiently. Finally, we propose a decision-making layer to accept “suggestions” of all classifiers in order to achieve the most reliable prediction result. The experimental results show that BSIL associated with CNN, RNN and Self-attention model can performs better than those models in imbalanced text classification.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Al-Stouhi, S., Reddy, K.: Transfer learning for class imbalance problems with inadequate data. Knowl. Inf. Syst. 48(1), 201–228 (2016)
Article Google Scholar
Charte, F., Rivera, J., del Jesus, J., Herrera, F.: REMEDIAL-HwR: tackling multilabel imbalance through label decoupling and data resampling hybridization. Neurocomputing 326, 110–122 (2019)
Article Google Scholar
Charte, F., Rivera, J., del Jesus, J., Herrera, F.: Addressing imbalance in multilabel classification: measures and random resampling algorithms. Neurocomputing 163, 3–16 (2015)
Article Google Scholar
Chen, W., Cao, Y., Sun, Y., Liu, Q., Li, Y.: Improving brain storm optimization algorithm via simplex search. arXiv, CoRR abs/1712.03166 (2017)
Google Scholar
Cheng, S., Qin, Q., Chen, J., Shi, Y.: Brain storm optimization algorithm: a review. Artif. Intell. Rev. 46(4), 445–458 (2016)
Article Google Scholar
Datta, S., Nag, S., Mullick, S., Das, S.: Diversifying support vector machines for boosting using kernel perturbation: Applications to class imbalance and small disjuncts. arXiv, CoRR abs/1712.08493 (2017)
Google Scholar
He, H., Garcia, A.: Learning from imbalanced data. IEEE Trans. Knowl. Data Eng. 9, 1263–1284 (2008)
Google Scholar
Khan, H., Hayat, M., Bennamoun, M., Sohel, A., Togneri, R.: Cost-sensitive learning of deep feature representations from imbalanced data. IEEE Trans. Neural Netw. Learn. Syst. 29(8), 3573–3587 (2018)
Article Google Scholar
Kubat, M., Holte, C., Matwin, S.: Machine learning for the detection of oil spills in satellite radar images. Mach. Learn. 30(2–3), 195–215 (1998)
Article Google Scholar
Lai, S., Xu, L., Liu, K., Zhao, J.: Recurrent convolutional neural networks for text classification. In: Proceedings of AAAI 2015, pp. 2267–2273 (2015)
Google Scholar
Lin, C., Tsai, F., Hu, H., Jhang, S.: Clustering-based undersampling in class-imbalanced data. Inf. Sci. 409, 17–26 (2017)
Article Google Scholar
Moreo A., Esuli A., Sebastiani F.: Distributional random oversampling for imbalanced text classification. In: Proceedings of SIGIR 2016, pp. 805–808 (2016)
Google Scholar
Sun Y., Kamel M., Wang Y.: Boosting for learning multiple classes with imbalanced class distribution. In: Proceedings of ICDM 2017, pp. 592–602 (2006)
Google Scholar
Wang, J., Chen, Y., Hao, S., Feng, W., Shen, Z.: Balanced distribution adaptation for transfer learning. In: Proceedings of ICDM 2017, pp. 1129–1134 (2017)
Google Scholar
Wang, S., Minku, L., Yao, X.: Resampling-based ensemble methods for online class imbalance learning. IEEE Trans. Knowl. Data Eng. 27(5), 1356–1368 (2015)
Article Google Scholar

Download references

Acknowledgments

This work is supported by the National Key Research and Development Program of China (2017YFB1401200, 2017YFC0908401) and the National Natural Science Foundation of China (61672377). Xiaowang Zhang is supported by the Peiyang Young Scholars in Tianjin University (2019XRX-0032).

Author information

Authors and Affiliations

College of Intelligence and Computing, Tianjin University, Tianjin, China
Jiachen Tian, Shizhan Chen, Xiaowang Zhang & Zhiyong Feng
Tianjin Key Laboratory of Cognitive Computing and Application, Tianjin, China
Jiachen Tian, Shizhan Chen, Xiaowang Zhang & Zhiyong Feng

Authors

Jiachen Tian
View author publications
You can also search for this author in PubMed Google Scholar
Shizhan Chen
View author publications
You can also search for this author in PubMed Google Scholar
Xiaowang Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Zhiyong Feng
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Xiaowang Zhang .

Editor information

Editors and Affiliations

Tsinghua University, Beijing, China
Jie Tang
National University of Singapore, Singapore, Singapore
Min-Yen Kan
Peking University, Beijing, China
Dongyan Zhao
Peking University, Beijing, China
Sujian Li
Zhengzhou University, Zhengzhou, China
Hongying Zan

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Tian, J., Chen, S., Zhang, X., Feng, Z. (2019). BSIL: A Brain Storm-Based Framework for Imbalanced Text Classification. In: Tang, J., Kan, MY., Zhao, D., Li, S., Zan, H. (eds) Natural Language Processing and Chinese Computing. NLPCC 2019. Lecture Notes in Computer Science(), vol 11839. Springer, Cham. https://doi.org/10.1007/978-3-030-32236-6_5

Download citation

DOI: https://doi.org/10.1007/978-3-030-32236-6_5
Published: 30 September 2019
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-32235-9
Online ISBN: 978-3-030-32236-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Societies and partnerships

the China Computer Federation (CCF) (opens in a new tab)