SFDA: Chinese Diabetic Text Classification Based on Sentence Feature Level Data Augmentation

Wang, Qingyan; Wang, Ye; Lei, Dajiang

doi:10.1007/978-981-99-5847-4_43

Qingyan Wang¹²,
Ye Wang¹² &
Dajiang Lei¹²

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 1870))

Included in the following conference series:

International Conference on Neural Computing for Advanced Applications

381 Accesses

Abstract

Many type 2 diabetes patients and high-risk groups has an increasing demand for specialized information on diabetes. However, the long-tail problem often generate difficulties in model training and reduced classification accuracy. In this paper, we propose enhancing senmantic feature approach to solve the long-tail problem in Chinese diabetes text classification and detailed practice is as followes: we enrich the tail classes knowledge by enhancing semantic features module and then use the attention aggregation module to improve the semantic representation by fusing these semantic features. As for the enhancing semantic feature module, we proposed two strategies: using different dropouts while pre-trained language model is same and using different pre-trained language model. As for the attention aggregation module, its purpose is to better fusing the semantic features obtained previously. After processing by these two modules, we send the final feature vector into the classifier. The final accuracy of 89.1% was obtained for the classification of Chinese diabetes in the NCAA2023 assessment.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Sisodia, D., Sisodia, D.S.: Prediction of diabetes using classification algorithms. Procedia Comput. Sci. 132, 1578–1585 (2018)
Google Scholar
Dewangan, A.K., Agrawal, P.: Classification of diabetes mellitus using machine learning techniques. Int. J. Eng. Appl. Sci. 2(5), 257905 (2015)
Google Scholar
Wang, Y., Zhou, Z., Jin, S., Liu, D., Lu, M.: Comparisons and selections of features and classifiers for short text classification. In: Iop Conference Series: Materials Science and Engineering, vol. 261, p. 012018. IOP Publishing (2017)
Google Scholar
Ali, A., Alrubei, M.A.T., Hassan, L.F.M., Al-Ja’afari, M.A.M., Abdulwahed, S.H.: Diabetes classification based on KNN. IIUM Eng. J. 21(1), 175–181 (2020)
Google Scholar
Saxena, R., Sharma, S.K., Gupta, M., Sampada, G.C.: A novel approach for feature selection and classification of diabetes mellitus: machine learning methods. Computational Intelligence and Neuroscience, 2022 (2022)
Google Scholar
Anuja Kumari, V., Chitra, R.: Classification of diabetes disease using support vector machine. Int. J. Eng. Res. Appl. 3(2), 1797–1801 (2013)
Google Scholar
Wang, Y., Liao, J., Yu, H., Leng, J.: Semantic-aware conditional variational autoencoder for one-to-many dialogue generation. Neural Comput. Appl., 1–13 (2022). https://doi.org/10.1007/s00521-022-07182-9
Qiang, Y., Suresh Kumar, S.T., Brocanelli, M., Zhu, D.: Tiny RNN model with certified robustness for text classification. In: 2022 International Joint Conference on Neural Networks (IJCNN), pp. 1–8. IEEE (2022)
Google Scholar
Wang, Y., Wang, H., Zhang, X., Chaspari, T., Choe, Y., Lu, M.: An attention-aware bidirectional multi-residual recurrent neural network (abmrnn): a study about better short-term text classification. In: ICASSP 2019–2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 3582–3586. IEEE (2019)
Google Scholar
Wang, Y., Zhang, X., Mi, L., Wang, H., Choe, Y.: Attention augmentation with multi-residual in bidirectional LSTM. Neurocomputing 385, 340–347 (2020)
Article Google Scholar
Li, Q., et al.: A survey on text classification: from traditional to deep learning. ACM Trans. Intell. Syst. Technol. (TIST) 13(2), 1–41 (2022)
Google Scholar
Chen, X., Cong, P., Lv, S.: A long-text classification method of Chinese news based on bert and CNN. IEEE Access 10, 34046–34057 (2022)
Article Google Scholar
Liu, Z., Huang, H., Lu, C., Lyu, S.: Multichannel CNN with attention for text classification. arXiv preprint arXiv:2006.16174 (2020)
Kang, B., Li, Y., Xie, S., Yuan, Z., Feng, J.: Exploring balanced feature spaces for representation learning. In: International Conference on Learning Representations (2021)
Google Scholar
Menon, A.K., Jayasumana, S., Rawat, A.S., Jain, H., Veit, A., Kumar, S.: Long-tail learning via logit adjustment. arXiv preprint arXiv:2007.07314 (2020)
Vapnik, V.: Principles of risk minimization for learning theory. Advances in neural information processing systems, 4 (1991)
Google Scholar
Ju, L., et al.: Relational subsets knowledge distillation for long-tailed retinal diseases recognition. In: de Bruijne, M., Cattin, P.C., Cotin, S., Padoy, N., Speidel, S., Zheng, Y., Essert, C. (eds.) MICCAI 2021. LNCS, vol. 12908, pp. 3–12. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-87237-3_1
Chapter Google Scholar
Xiao, L., Zhang, X., Jing, L., Huang, C., Song, M.: Does head label help for long-tailed multi-label text classification. In: Proceedings of the AAAI Conference on Artificial Intelligence 35, pp. 14103–14111 (2021)
Google Scholar
Huang, Y., Giledereli, B., Köksal, A., Özgür, A., Ozkirimli, E.: Balancing methods for multi-label text classification with long-tailed class distribution. arXiv preprint arXiv:2109.04712 (2021)
Estabrooks, A., Jo, T., Japkowicz, N.: A multiple resampling method for learning from imbalanced data sets. Comput. Intell. 20(1), 18–36 (2004)
Article MathSciNet Google Scholar
Liu, J., Sun, Y., Han, C., Dou, Z., Li, W.: Deep representation learning on long-tailed data: a learnable embedding augmentation perspective. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2970–2979 (2020)
Google Scholar
Liu, X.-Y., Wu, J., Zhou, Z.-H.: Exploratory undersampling for class-imbalance learning. IEEE Trans. Syst. Man Cybern. Part B (Cybernetics) 39(2), 539–550 (2008)
Google Scholar
Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: Smote: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321-357 (2002)
Google Scholar
Cao, K., Wei, C., Gaidon, A., Arechiga, N., Ma, T.: Learning imbalanced datasets with label-distribution-aware margin loss. Advances in neural information processing systems, 32 (2019)
Google Scholar
Park, S., Lim, J., Jeon, Y., Choi, J.Y.: Influence-balanced loss for imbalanced visual classification. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 735–744 (2021)
Google Scholar
Feng, C., Zhong, Y., Huang, W.: Exploring classification equilibrium in long-tailed object detection. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 3417–3426 (2021)
Google Scholar
Xiang, L., Ding, G., Han, J.: Learning from multiple experts: self-paced knowledge distillation for long-tailed classification. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12350, pp. 247–263. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58558-7_15
Chapter Google Scholar
He, K., Girshick, R., Dollár, P.: Rethinking imagenet pre-training. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 4918–4927 (2019)
Google Scholar
Zhang, Y., Hooi, B., Dapeng, H., Liang, J., Feng, J.: Unleashing the power of contrastive self-supervised visual models via contrast-regularized fine-tuning. Adv. Neural. Inf. Process. Syst. 34, 29848–29860 (2021)
Google Scholar
Hu, X., Jiang, Y., Tang, K., Chen, J., Miao, C., Zhang, H.: Learning to segment the tail. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 14045–14054 (2020)
Google Scholar
Wang, J., Lukasiewicz, T., Hu, X., Cai, J., Xu, Z.: RSG: a simple but effective module for learning imbalanced datasets. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 3784–3793 (2021)
Google Scholar
Huang, C., Li, Y., Loy, C.C., Tang, X.: Learning deep representation for imbalanced classification. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5375–5384 (2016)
Google Scholar
Zhang, X., Fang, Z., Wen, Y., Li, Z., Qiao, Y.: Range loss for deep face recognition with long-tailed training data. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 5409–5418 (2017)
Google Scholar
Cui, J., Zhong, Z., Liu, S., Yu, B., Jia, J.: Parametric contrastive learning. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 715–724 (2021)
Google Scholar
Wu, T.-Y., Morgado, P., Wang, P., Ho, C.-H., Vasconcelos, N.: Solving long-tailed recognition with deep realistic taxonomic classifier. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12353, pp. 171–189. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58598-3_11
Chapter Google Scholar
Liu, B., Li, H., Kang, H., Hua, G., Vasconcelos, N.: Gistnet: a geometric structure transfer network for long-tailed recognition. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 8209–8218 (2021)
Google Scholar
Zhong, Z., Cui, J., Liu, S., Jia, J.: Improving calibration for long-tailed recognition. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Rrecognition, pp. 16489–16498 (2021)
Google Scholar
Desai, A., Wu, T.-Y., Tripathi, S., Vasconcelos, N.: Learning of visual relations: the devil is in the tails. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 15404–15413 (2021)
Google Scholar
Zhang, Y., Hooi, B., Hong, L., Feng, J.: Self-supervised aggregation of diverse experts for test-agnostic long-tailed recognition. Adv. Neural. Inf. Process. Syst. 35, 34077–34090 (2022)
Google Scholar
Cai, J., Wang, Y., Hwang, J.-N.: Ace: ally complementary experts for solving long-tailed recognition in one-shot. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 112–121 (2021)
Google Scholar
Kim, J., Jeong, J., Shin, J.: M2m: imbalanced classification via major-to-minor translation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 13896–13905 (2020)
Google Scholar
Buda, M., Maki, A., Mazurowski, M.A.: A systematic study of the class imbalance problem in convolutional neural networks. Neural Networks 106, 249–259 (2018)
Google Scholar
Japkowicz, N., Stephen, S.: The class imbalance problem: a systematic study. Intell. Data Anal. 6(5), 429–449 (2002)
Article MATH Google Scholar
Cui, Y., Jia, M., Lin, T.-Y., Song, Y., Belongie, S.: Class-balanced loss based on effective number of samples. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 9268–9277 (2019)
Google Scholar
Ren, J., Cunjun, Yu., Ma, X., Zhao, H., Yi, S., et al.: Balanced meta-softmax for long-tailed visual recognition. Adv. Neural. Inf. Process. Syst. 33, 4175–4186 (2020)
Google Scholar
Wang, F., Cheng, J., Liu, W., Liu, H.: Additive margin softmax for face verification. IEEE Signal Process. Lett. 25(7), 926–930 (2018)
Article Google Scholar
Koltchinskii, V., Panchenko, D.: Empirical margin distributions and bounding the generalization error of combined classifiers. Ann. Stat. 30(1), 1–50 (2002)
Article MathSciNet MATH Google Scholar
Khan, S., Hayat, M., Zamir, S.W., Shen, J., Shao, L.: Striking the right balance with uncertainty. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 103–112 (2019)
Google Scholar
Wang, Y.-X., Ramanan, D., Hebert, M.: Learning to model the tail. Advances in neural information processing systems, 30 (2017)
Google Scholar
Tan, C., Sun, F., Kong, T., Zhang, W., Yang, C., Liu, C.: A survey on deep transfer learning. In: Kůrková, V., Manolopoulos, Y., Hammer, B., Iliadis, L., Maglogiannis, I. (eds.) ICANN 2018. LNCS, vol. 11141, pp. 270–279. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01424-7_27
Chapter Google Scholar
Zoph, B., et al.: Rethinking pre-training and self-training. Advances in neural information processing systems, 33, pp. 3833–3845 (2020)
Google Scholar
He, K., Fan, H., Wu, Y., Xie, S., Girshick, R.: Momentum contrast for unsupervised visual representation learning. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 9729–9738 (2020)
Google Scholar
Ye, H.-J., Chen, H.-Y., Zhan, D.-C., Chao, W.-L.: Identifying and compensating for feature deviation in imbalanced deep learning. arXiv preprint arXiv:2001.01385 (2020)
Kang, B., et al.: Decoupling representation and classifier for long-tailed recognition. arXiv preprint arXiv:1910.09217 (2019)
Wang, X., Lian, L., Miao, Z., Liu, Z., Yu, S.X.: Long-tailed recognition by routing diverse distribution-aware experts. arXiv preprint arXiv:2010.01809 (2020)
Guo, H., Wang, S.: Long-tailed multi-label visual recognition by collaborative training on uniform and re-balanced samplings. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 15089–15098 (2021)
Google Scholar
Gao, T., Yao, X., Chen, D.: Simcse: simple contrastive learning of sentence embeddings. arXiv preprint arXiv:2104.08821 (2021)

Download references

Acknowledgements

This work was partly supported by the National Key R&D Program of China (2021YFF0704100), the National Natural Science Foundation of China (62136002, 61876027, 61936001), the Science and Technology Research Program of Chongqing Municipal Education Commission (KJQN202100627 and KJQN202100629), and the National Natural Science Foundation of Chongqing (cstc2022ycjh-bgzxm0004, cstc2019jcyj-cxttX0002), respectively.

Author information

Authors and Affiliations

Chongqing Key Laboratory of Image Recognition, Chongqing University of Posts and Telecommunications, Chongqing, 40065, China
Qingyan Wang, Ye Wang & Dajiang Lei

Authors

Qingyan Wang
View author publications
You can also search for this author in PubMed Google Scholar
Ye Wang
View author publications
You can also search for this author in PubMed Google Scholar
Dajiang Lei
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Dajiang Lei .

Editor information

Editors and Affiliations

Harbin Institute of Technology, Shenzhen, China
Haijun Zhang
Chaohu University, Hefei, China
Yinggen Ke
Chongqing University, Chongqing, China
Zhou Wu
South China Normal University, Guangzhou, China
Tianyong Hao
Hefei University of Technology, Hefei, China
Zhao Zhang
Technical University of Denmark, Kongens Lyngby, Denmark
Weizhi Meng
Chaohu University, Hefei, China
Yuanyuan Mu

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Wang, Q., Wang, Y., Lei, D. (2023). SFDA: Chinese Diabetic Text Classification Based on Sentence Feature Level Data Augmentation. In: Zhang, H., et al. International Conference on Neural Computing for Advanced Applications. NCAA 2023. Communications in Computer and Information Science, vol 1870. Springer, Singapore. https://doi.org/10.1007/978-981-99-5847-4_43

Download citation

DOI: https://doi.org/10.1007/978-981-99-5847-4_43
Published: 30 August 2023
Publisher Name: Springer, Singapore
Print ISBN: 978-981-99-5846-7
Online ISBN: 978-981-99-5847-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

SFDA: Chinese Diabetic Text Classification Based on Sentence Feature Level Data Augmentation