Skip to main content
Log in

Causal representation for few-shot text classification

  • Published:
Applied Intelligence Aims and scope Submit manuscript

Abstract

Few-Shot Text Classification (FSTC) is a fundamental natural language processing problem that aims to classify small amounts of text with high accuracy. Mainstream methods model the superficial statistical relationships between text and labels. However, distributional imbalance problems are encountered during few-shot learning; therefore, questions remain regarding its robustness and generalization. The above problems can be addressed by intrinsic causal mechanisms. We introduce a general structural causal model to formalize the FSTC problem. To extract causal associations from text and reconstruct information to achieve a better classification effect, we propose a causal representation for few-shot learning (CRFL) framework to force representations to be causally related. Our framework performs well when the number of training examples is small or when it generalizes to the data transfer situation. CRFL is orthogonal to the existing fine-tuning and few-shot meta-learning methods and can be applied to any task. Extensive experimental results obtained on several widely used datasets validate the effectiveness of our approach, which can be attributed to our model’s stability and logical reasoning.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6

Similar content being viewed by others

References

  1. Kowsari K, Meimandi KJ, Heidarysafa M, Mendu S, Barnes LE, Brown DE (2019) Text classification algorithms: A survey. Inf. 10(4):150. https://doi.org/10.3390/info10040150

    Article  Google Scholar 

  2. Mukherjee S, Awadallah A (2020) Uncertainty-aware self-training for few-shot text classification. Advances in Neural Information Processing Systems 33:21199–21212

    Google Scholar 

  3. Huang W, Zhang L, Wang S, Wu H, Song A (2022) Deep ensemble learning for human activity recognition using wearable sensors via filter activation. ACM Transactions on Embedded Computing Systems 22(1):1–23

    Article  Google Scholar 

  4. Huang W, Zhang L, Wu H, Min F, Song A (2022) Channel-Equalization-HAR: a light-weight convolutional neural network for wearable sensor based human activity recognition. IEEE Transactions on Mobile Computing. https://doi.org/10.1109/TMC.2022.3174816

    Article  Google Scholar 

  5. He, Y., Wang, Z., Cui, P., Zou, H., Zhang, Y., Cui, Q., Jiang, Y.: Causpref: Causal preference learning for out-of-distribution recommendation. In: Proceedings of the ACM Web Conference 2022, pp. 410–421 (2022)

  6. Pearl, J.: Direct and indirect effects. In: Probabilistic and Causal Inference: The Works of Judea Pearl, pp. 373–392 (2022)

  7. Fan, J., Ou, Z., Yu, X., Yang, J., Wang, S., Kang, X., Zhang, H., Song, M.: Episodic projection network for out-of-distribution detection in few-shot learning. In: 2022 26th International Conference on Pattern Recognition (ICPR), pp. 3076–3082 (2022). IEEE

  8. Shi X, Pan Z, Miao W (2023) Data integration in causal inference. Wiley Interdisciplinary Reviews: Computational Statistics 15(1):1581

    Article  MathSciNet  Google Scholar 

  9. Boyd J, Sexton O, Angus C, Meier P, Purshouse RC, Holmes J (2022) Causal mechanisms proposed for the alcohol harm paradox-a systematic review. Addiction 117(1):33–56

    Article  Google Scholar 

  10. Pearl, J.: Causal inference. Causality: objectives and assessment, 39–58 (2010)

  11. Arif S, MacNeil MA (2023) Applying the structural causal model framework for observational causal inference in ecology. Ecological Monographs 93(1):1554

    Article  Google Scholar 

  12. Lv, F., Liang, J., Li, S., Zang, B., Liu, C.H., Wang, Z., Liu, D.: Causality inspired representation learning for domain generalization. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 8046–8056 (2022)

  13. Sun, Q., Liu, Y., Chua, T.-S., Schiele, B.: Meta-transfer learning for few-shot learning. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 403–412 (2019)

  14. Bayer, M., Kaufhold, M.-A., Reuter, C.: A survey on data augmentation for text classification. ACM Computing Surveys (2021)

  15. Kim, H.H., Woo, D., Oh, S.J., Cha, J.-W., Han, Y.-S.: Alp: Data augmentation using lexicalized pcfgs for few-shot text classification. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 36, pp. 10894–10902 (2022)

  16. Luo, Q., Liu, L., Lin, Y., Zhang, W.: Don’t miss the labels: Label-semantic augmented meta-learner for few-shot text classification. In: Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021, pp. 2773–2782 (2021)

  17. Liu M, Liu L, Cao J, Du Q (2022) Co-attention network with label embedding for text classification. Neurocomputing 471:61–69

    Article  Google Scholar 

  18. Liu P, Yuan W, Fu J, Jiang Z, Hayashi H, Neubig G (2023) Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys 55(9):1–35

    Article  Google Scholar 

  19. Liu, X., Ji, K., Fu, Y., Tam, W., Du, Z., Yang, Z., Tang, J.: P-tuning: Prompt tuning can be comparable to fine-tuning across scales and tasks. In: Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), pp. 61–68 (2022)

  20. Schölkopf B, Locatello F, Bauer S, Ke NR, Kalchbrenner N, Goyal A, Bengio Y (2021) Toward causal representation learning. Proceedings of the IEEE 109(5):612–634

    Article  Google Scholar 

  21. Guo R, Cheng L, Li J, Hahn PR, Liu H (2020) A survey of learning causality with data: Problems and methods. ACM Computing Surveys (CSUR) 53(4):1–37

    Google Scholar 

  22. Li Z, Ouyang F, Zhou C, He Y, Shen L (2022) Few-shot relation classification research based on prototypical network and causal intervention. IEEE Access 10:36995–37002

    Article  Google Scholar 

  23. Salim F, Mizutani S, Zolfo M, Yamada T (2023) Recent advances of machine learning applications in human gut microbiota study: from observational analysis toward causal inference and clinical intervention. Current Opinion in Biotechnology 79:102884

    Article  Google Scholar 

  24. Nogueira AR, Pugnana A, Ruggieri S, Pedreschi D, Gama J (2022) Methods and tools for causal discovery and causal inference. Wiley interdisciplinary reviews: data mining and knowledge discovery 12(2):1449

    Google Scholar 

  25. Baranova A, Cao H, Zhang F (2023) Causal effect of covid-19 on alzheimer’s disease: A mendelian randomization study. Journal of Medical Virology 95(1):28107

    Article  Google Scholar 

  26. Xin J, Gu D, Chen S, Ben S, Li H, Zhang Z, Du M, Wang M (2023) Summer: a mendelian randomization interactive server to systematically evaluate the causal effects of risk factors and circulating biomarkers on pan-cancer survival. Nucleic acids research 51(D1):1160–1167

    Article  Google Scholar 

  27. Ren J, Cislo P, Cappelleri JC, Hlavacek P, DiBonaventura M (2023) Comparing g-computation, propensity score-based weighting, and targeted maximum likelihood estimation for analyzing externally controlled trials with both measured and unmeasured confounders: a simulation study. BMC Medical Research Methodology 23(1):1–11

    Article  Google Scholar 

  28. Sieswerda M, Xie S, van Rossum R, Bermejo I, Geleijnse G, Aben K, van Erning F, Lemmens V, Dekker A, Verbeek X (2023) Identifying confounders using bayesian networks and estimating treatment effect in prostate cancer with observational data. JCO Clinical Cancer Informatics 7:2200080

    Article  Google Scholar 

  29. Jacovi, A., Marasović, A., Miller, T., Goldberg, Y.: Formalizing trust in artificial intelligence: Prerequisites, causes and goals of human trust in ai. In: Proceedings of the 2021 ACM Conference on Fairness, Accountability, and Transparency, pp. 624–635 (2021)

  30. Makino, T., Geras, K.J., Cho, K.: Mitigating input-causing confounding in multimodal learning via the backdoor adjustment. In: NeurIPS 2022 Workshop on Causality for Real-world Impact (2022)

  31. Yu, D., Li, Q., Wang, X., Xu, G.: Deconfounded recommendation via causal intervention. Neurocomputing (2023)

  32. Tang K, Huang J, Zhang H (2020) Long-tailed classification by keeping the good and removing the bad momentum causal effect. Advances in Neural Information Processing Systems 33:1513–1524

    Google Scholar 

  33. Landeiro V, Culotta A (2018) Robust text classification under confounding shift. Journal of Artificial Intelligence Research 63:391–419

    Article  MathSciNet  Google Scholar 

  34. Yu K, Guo X, Liu L, Li J, Wang H, Ling Z, Wu X (2020) Causality-based feature selection: Methods and evaluations. ACM Computing Surveys (CSUR) 53(5):1–36

    Article  Google Scholar 

  35. Li, Y., Chen, H., Tan, J., Zhang, Y.: Causal factorization machine for robust recommendation. In: Proceedings of the 22nd ACM/IEEE Joint Conference on Digital Libraries, pp. 1–9 (2022)

  36. Li, A., Pearl, J.: Unit selection with causal diagram. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 36, pp. 5765–5772 (2022)

  37. Penrose, O., Percival, I.C.: The direction of time. Proceedings of the Physical Society (1958-1967) 79(3), 605 (1962)

  38. Squires, C., Shen, D., Agarwal, A., Shah, D., Uhler, C.: Causal imputation via synthetic interventions. In: Conference on Causal Learning and Reasoning, pp. 688–711 (2022). PMLR

  39. Müller, J., Schmier, R., Ardizzone, L., Rother, C., Köthe, U.: Learning robust models using the principle of independent causal mechanisms. In: Pattern Recognition: 43rd DAGM German Conference, DAGM GCPR 2021, Bonn, Germany, September 28–October 1, 2021, Proceedings, pp. 79–110 (2022). Springer

  40. Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012)

  41. Socher, R., Perelygin, A., Wu, J., Chuang, J., Manning, C.D., Ng, A.Y., Potts, C.: Recursive deep models for semantic compositionality over a sentiment treebank. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, pp. 1631–1642 (2013)

  42. PANG, B.: Seeing stars: Exploiting class relationships for sentiment categorization with respect to rating scales. In: Proc. 43rd Annual Meeting of the Association for Computational Linguistics, 2005, pp. 115–124 (2005)

  43. Hu, M., Liu, B.: Mining and summarizing customer reviews. In: Proceedings of the Tenth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 168–177 (2004)

  44. Pang, B., Lee, L.: A sentimental education: Sentiment analysis using subjectivity summarization based on minimum cuts. In: Proceedings of the 42nd Annual Meeting of the Association for Computational Linguistics (ACL-04), pp. 271–278 (2004)

  45. Zhang, X., Zhao, J., LeCun, Y.: Character-level convolutional networks for text classification. Advances in neural information processing systems 28 (2015)

  46. Kenton, J.D.M.-W.C., Toutanova, L.K.: Bert: Pre-training of deep bidirectional transformers for language understanding. In: Proceedings of NAACL-HLT, pp. 4171–4186 (2019)

  47. Brown T, Mann B, Ryder N, Subbiah M, Kaplan JD, Dhariwal P, Neelakantan A, Shyam P, Sastry G, Askell A et al (2020) Language models are few-shot learners. Advances in neural information processing systems 33:1877–1901

    Google Scholar 

  48. Gu, Y., Han, X., Liu, Z., Huang, M.: Ppt: Pre-trained prompt tuning for few-shot learning. In: Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pp. 8410–8423 (2022)

  49. Gao, T., Fisch, A., Chen, D.: Making pre-trained language models better few-shot learners. In: Joint Conference of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing, ACL-IJCNLP 2021, pp. 3816–3830 (2021). Association for Computational Linguistics (ACL)

  50. Chen, Y., Liu, Y., Dong, L., Wang, S., Zhu, C., Zeng, M., Zhang, Y.: Adaprompt: Adaptive model training for prompt-based nlp (2022)

  51. Min, S., Lewis, M., Hajishirzi, H., Zettlemoyer, L.: Noisy channel language model prompting for few-shot text classification. In: Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pp. 5316–5330 (2022)

Download references

Funding

This work was supported by the Natural Science Foundation of China under Grants No.61966038 and No.62266051, and the Science Foundation of Yunnan University under Grant 2021Z073.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Xiaobing Zhou.

Ethics declarations

Conflict of interest

There are no conflicting interests known to the authors.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Yang, M., Zhang, X., Wang, J. et al. Causal representation for few-shot text classification. Appl Intell 53, 21422–21432 (2023). https://doi.org/10.1007/s10489-023-04667-5

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10489-023-04667-5

Keywords

Navigation