Skip to main content

SFDA: Chinese Diabetic Text Classification Based on Sentence Feature Level Data Augmentation

  • Conference paper
  • First Online:
International Conference on Neural Computing for Advanced Applications (NCAA 2023)

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 1870))

Included in the following conference series:

  • 381 Accesses

Abstract

Many type 2 diabetes patients and high-risk groups has an increasing demand for specialized information on diabetes. However, the long-tail problem often generate difficulties in model training and reduced classification accuracy. In this paper, we propose enhancing senmantic feature approach to solve the long-tail problem in Chinese diabetes text classification and detailed practice is as followes: we enrich the tail classes knowledge by enhancing semantic features module and then use the attention aggregation module to improve the semantic representation by fusing these semantic features. As for the enhancing semantic feature module, we proposed two strategies: using different dropouts while pre-trained language model is same and using different pre-trained language model. As for the attention aggregation module, its purpose is to better fusing the semantic features obtained previously. After processing by these two modules, we send the final feature vector into the classifier. The final accuracy of 89.1% was obtained for the classification of Chinese diabetes in the NCAA2023 assessment.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Sisodia, D., Sisodia, D.S.: Prediction of diabetes using classification algorithms. Procedia Comput. Sci. 132, 1578–1585 (2018)

    Google Scholar 

  2. Dewangan, A.K., Agrawal, P.: Classification of diabetes mellitus using machine learning techniques. Int. J. Eng. Appl. Sci. 2(5), 257905 (2015)

    Google Scholar 

  3. Wang, Y., Zhou, Z., Jin, S., Liu, D., Lu, M.: Comparisons and selections of features and classifiers for short text classification. In: Iop Conference Series: Materials Science and Engineering, vol. 261, p. 012018. IOP Publishing (2017)

    Google Scholar 

  4. Ali, A., Alrubei, M.A.T., Hassan, L.F.M., Al-Ja’afari, M.A.M., Abdulwahed, S.H.: Diabetes classification based on KNN. IIUM Eng. J. 21(1), 175–181 (2020)

    Google Scholar 

  5. Saxena, R., Sharma, S.K., Gupta, M., Sampada, G.C.: A novel approach for feature selection and classification of diabetes mellitus: machine learning methods. Computational Intelligence and Neuroscience, 2022 (2022)

    Google Scholar 

  6. Anuja Kumari, V., Chitra, R.: Classification of diabetes disease using support vector machine. Int. J. Eng. Res. Appl. 3(2), 1797–1801 (2013)

    Google Scholar 

  7. Wang, Y., Liao, J., Yu, H., Leng, J.: Semantic-aware conditional variational autoencoder for one-to-many dialogue generation. Neural Comput. Appl., 1–13 (2022). https://doi.org/10.1007/s00521-022-07182-9

  8. Qiang, Y., Suresh Kumar, S.T., Brocanelli, M., Zhu, D.: Tiny RNN model with certified robustness for text classification. In: 2022 International Joint Conference on Neural Networks (IJCNN), pp. 1–8. IEEE (2022)

    Google Scholar 

  9. Wang, Y., Wang, H., Zhang, X., Chaspari, T., Choe, Y., Lu, M.: An attention-aware bidirectional multi-residual recurrent neural network (abmrnn): a study about better short-term text classification. In: ICASSP 2019–2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 3582–3586. IEEE (2019)

    Google Scholar 

  10. Wang, Y., Zhang, X., Mi, L., Wang, H., Choe, Y.: Attention augmentation with multi-residual in bidirectional LSTM. Neurocomputing 385, 340–347 (2020)

    Article  Google Scholar 

  11. Li, Q., et al.: A survey on text classification: from traditional to deep learning. ACM Trans. Intell. Syst. Technol. (TIST) 13(2), 1–41 (2022)

    Google Scholar 

  12. Chen, X., Cong, P., Lv, S.: A long-text classification method of Chinese news based on bert and CNN. IEEE Access 10, 34046–34057 (2022)

    Article  Google Scholar 

  13. Liu, Z., Huang, H., Lu, C., Lyu, S.: Multichannel CNN with attention for text classification. arXiv preprint arXiv:2006.16174 (2020)

  14. Kang, B., Li, Y., Xie, S., Yuan, Z., Feng, J.: Exploring balanced feature spaces for representation learning. In: International Conference on Learning Representations (2021)

    Google Scholar 

  15. Menon, A.K., Jayasumana, S., Rawat, A.S., Jain, H., Veit, A., Kumar, S.: Long-tail learning via logit adjustment. arXiv preprint arXiv:2007.07314 (2020)

  16. Vapnik, V.: Principles of risk minimization for learning theory. Advances in neural information processing systems, 4 (1991)

    Google Scholar 

  17. Ju, L., et al.: Relational subsets knowledge distillation for long-tailed retinal diseases recognition. In: de Bruijne, M., Cattin, P.C., Cotin, S., Padoy, N., Speidel, S., Zheng, Y., Essert, C. (eds.) MICCAI 2021. LNCS, vol. 12908, pp. 3–12. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-87237-3_1

    Chapter  Google Scholar 

  18. Xiao, L., Zhang, X., Jing, L., Huang, C., Song, M.: Does head label help for long-tailed multi-label text classification. In: Proceedings of the AAAI Conference on Artificial Intelligence 35, pp. 14103–14111 (2021)

    Google Scholar 

  19. Huang, Y., Giledereli, B., Köksal, A., Özgür, A., Ozkirimli, E.: Balancing methods for multi-label text classification with long-tailed class distribution. arXiv preprint arXiv:2109.04712 (2021)

  20. Estabrooks, A., Jo, T., Japkowicz, N.: A multiple resampling method for learning from imbalanced data sets. Comput. Intell. 20(1), 18–36 (2004)

    Article  MathSciNet  Google Scholar 

  21. Liu, J., Sun, Y., Han, C., Dou, Z., Li, W.: Deep representation learning on long-tailed data: a learnable embedding augmentation perspective. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2970–2979 (2020)

    Google Scholar 

  22. Liu, X.-Y., Wu, J., Zhou, Z.-H.: Exploratory undersampling for class-imbalance learning. IEEE Trans. Syst. Man Cybern. Part B (Cybernetics) 39(2), 539–550 (2008)

    Google Scholar 

  23. Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: Smote: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321-357 (2002)

    Google Scholar 

  24. Cao, K., Wei, C., Gaidon, A., Arechiga, N., Ma, T.: Learning imbalanced datasets with label-distribution-aware margin loss. Advances in neural information processing systems, 32 (2019)

    Google Scholar 

  25. Park, S., Lim, J., Jeon, Y., Choi, J.Y.: Influence-balanced loss for imbalanced visual classification. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 735–744 (2021)

    Google Scholar 

  26. Feng, C., Zhong, Y., Huang, W.: Exploring classification equilibrium in long-tailed object detection. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 3417–3426 (2021)

    Google Scholar 

  27. Xiang, L., Ding, G., Han, J.: Learning from multiple experts: self-paced knowledge distillation for long-tailed classification. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12350, pp. 247–263. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58558-7_15

    Chapter  Google Scholar 

  28. He, K., Girshick, R., Dollár, P.: Rethinking imagenet pre-training. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 4918–4927 (2019)

    Google Scholar 

  29. Zhang, Y., Hooi, B., Dapeng, H., Liang, J., Feng, J.: Unleashing the power of contrastive self-supervised visual models via contrast-regularized fine-tuning. Adv. Neural. Inf. Process. Syst. 34, 29848–29860 (2021)

    Google Scholar 

  30. Hu, X., Jiang, Y., Tang, K., Chen, J., Miao, C., Zhang, H.: Learning to segment the tail. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 14045–14054 (2020)

    Google Scholar 

  31. Wang, J., Lukasiewicz, T., Hu, X., Cai, J., Xu, Z.: RSG: a simple but effective module for learning imbalanced datasets. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 3784–3793 (2021)

    Google Scholar 

  32. Huang, C., Li, Y., Loy, C.C., Tang, X.: Learning deep representation for imbalanced classification. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5375–5384 (2016)

    Google Scholar 

  33. Zhang, X., Fang, Z., Wen, Y., Li, Z., Qiao, Y.: Range loss for deep face recognition with long-tailed training data. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 5409–5418 (2017)

    Google Scholar 

  34. Cui, J., Zhong, Z., Liu, S., Yu, B., Jia, J.: Parametric contrastive learning. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 715–724 (2021)

    Google Scholar 

  35. Wu, T.-Y., Morgado, P., Wang, P., Ho, C.-H., Vasconcelos, N.: Solving long-tailed recognition with deep realistic taxonomic classifier. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12353, pp. 171–189. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58598-3_11

    Chapter  Google Scholar 

  36. Liu, B., Li, H., Kang, H., Hua, G., Vasconcelos, N.: Gistnet: a geometric structure transfer network for long-tailed recognition. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 8209–8218 (2021)

    Google Scholar 

  37. Zhong, Z., Cui, J., Liu, S., Jia, J.: Improving calibration for long-tailed recognition. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Rrecognition, pp. 16489–16498 (2021)

    Google Scholar 

  38. Desai, A., Wu, T.-Y., Tripathi, S., Vasconcelos, N.: Learning of visual relations: the devil is in the tails. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 15404–15413 (2021)

    Google Scholar 

  39. Zhang, Y., Hooi, B., Hong, L., Feng, J.: Self-supervised aggregation of diverse experts for test-agnostic long-tailed recognition. Adv. Neural. Inf. Process. Syst. 35, 34077–34090 (2022)

    Google Scholar 

  40. Cai, J., Wang, Y., Hwang, J.-N.: Ace: ally complementary experts for solving long-tailed recognition in one-shot. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 112–121 (2021)

    Google Scholar 

  41. Kim, J., Jeong, J., Shin, J.: M2m: imbalanced classification via major-to-minor translation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 13896–13905 (2020)

    Google Scholar 

  42. Buda, M., Maki, A., Mazurowski, M.A.: A systematic study of the class imbalance problem in convolutional neural networks. Neural Networks 106, 249–259 (2018)

    Google Scholar 

  43. Japkowicz, N., Stephen, S.: The class imbalance problem: a systematic study. Intell. Data Anal. 6(5), 429–449 (2002)

    Article  MATH  Google Scholar 

  44. Cui, Y., Jia, M., Lin, T.-Y., Song, Y., Belongie, S.: Class-balanced loss based on effective number of samples. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 9268–9277 (2019)

    Google Scholar 

  45. Ren, J., Cunjun, Yu., Ma, X., Zhao, H., Yi, S., et al.: Balanced meta-softmax for long-tailed visual recognition. Adv. Neural. Inf. Process. Syst. 33, 4175–4186 (2020)

    Google Scholar 

  46. Wang, F., Cheng, J., Liu, W., Liu, H.: Additive margin softmax for face verification. IEEE Signal Process. Lett. 25(7), 926–930 (2018)

    Article  Google Scholar 

  47. Koltchinskii, V., Panchenko, D.: Empirical margin distributions and bounding the generalization error of combined classifiers. Ann. Stat. 30(1), 1–50 (2002)

    Article  MathSciNet  MATH  Google Scholar 

  48. Khan, S., Hayat, M., Zamir, S.W., Shen, J., Shao, L.: Striking the right balance with uncertainty. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 103–112 (2019)

    Google Scholar 

  49. Wang, Y.-X., Ramanan, D., Hebert, M.: Learning to model the tail. Advances in neural information processing systems, 30 (2017)

    Google Scholar 

  50. Tan, C., Sun, F., Kong, T., Zhang, W., Yang, C., Liu, C.: A survey on deep transfer learning. In: Kůrková, V., Manolopoulos, Y., Hammer, B., Iliadis, L., Maglogiannis, I. (eds.) ICANN 2018. LNCS, vol. 11141, pp. 270–279. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01424-7_27

    Chapter  Google Scholar 

  51. Zoph, B., et al.: Rethinking pre-training and self-training. Advances in neural information processing systems, 33, pp. 3833–3845 (2020)

    Google Scholar 

  52. He, K., Fan, H., Wu, Y., Xie, S., Girshick, R.: Momentum contrast for unsupervised visual representation learning. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 9729–9738 (2020)

    Google Scholar 

  53. Ye, H.-J., Chen, H.-Y., Zhan, D.-C., Chao, W.-L.: Identifying and compensating for feature deviation in imbalanced deep learning. arXiv preprint arXiv:2001.01385 (2020)

  54. Kang, B., et al.: Decoupling representation and classifier for long-tailed recognition. arXiv preprint arXiv:1910.09217 (2019)

  55. Wang, X., Lian, L., Miao, Z., Liu, Z., Yu, S.X.: Long-tailed recognition by routing diverse distribution-aware experts. arXiv preprint arXiv:2010.01809 (2020)

  56. Guo, H., Wang, S.: Long-tailed multi-label visual recognition by collaborative training on uniform and re-balanced samplings. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 15089–15098 (2021)

    Google Scholar 

  57. Gao, T., Yao, X., Chen, D.: Simcse: simple contrastive learning of sentence embeddings. arXiv preprint arXiv:2104.08821 (2021)

Download references

Acknowledgements

This work was partly supported by the National Key R&D Program of China (2021YFF0704100), the National Natural Science Foundation of China (62136002, 61876027, 61936001), the Science and Technology Research Program of Chongqing Municipal Education Commission (KJQN202100627 and KJQN202100629), and the National Natural Science Foundation of Chongqing (cstc2022ycjh-bgzxm0004, cstc2019jcyj-cxttX0002), respectively.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Dajiang Lei .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2023 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Wang, Q., Wang, Y., Lei, D. (2023). SFDA: Chinese Diabetic Text Classification Based on Sentence Feature Level Data Augmentation. In: Zhang, H., et al. International Conference on Neural Computing for Advanced Applications. NCAA 2023. Communications in Computer and Information Science, vol 1870. Springer, Singapore. https://doi.org/10.1007/978-981-99-5847-4_43

Download citation

  • DOI: https://doi.org/10.1007/978-981-99-5847-4_43

  • Published:

  • Publisher Name: Springer, Singapore

  • Print ISBN: 978-981-99-5846-7

  • Online ISBN: 978-981-99-5847-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics