Skip to main content
Log in

Deconfounded classification by an intervention approach

  • Original Article
  • Published:
International Journal of Machine Learning and Cybernetics Aims and scope Submit manuscript

Abstract

For an automatic system, image classification is much challenging, partly due to noises that obscure or reduce the clarity of data. Thus, noise suppression has become one of the core tasks of classification. There is often a key assumption in classification that any noise source only affects one side of images and labels. However, this assumption overlooks the confounding scenario that the same noise source affects both images and labels. In this paper, we propose an intervention approach to learn a deconfounded classification model. Classification problem is firstly formulated as causal inference, where intervention is used to untangle the causal from the correlatives, and derive a causal effect formula for deconfounded classification. The WAE (Wasserstein Auto-Encoder) objective is then expanded for classification, with a new regularizer defined for learning unobserved confounders. To build a robust network architecture, a probability factorization is performed in conjunction with d-separation rule to find useful dependency patterns in data. The deconfounded classification model is finally obtained by rearranging the components of the learnt decoder according to the causal effect formula. The experimental results demonstrate our approach outperforms significantly the existing state-of-the-art classification models, particularly on imbalanced data.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6

Similar content being viewed by others

Data availability

The *datasets analysed during the current study were derived from the following public domain resources https://www.cs.toronto.edu/~kriz/cifar.html, https://www.worldlink.com.cn/en/osdir/fashion-mnist.html, http://ufldl.stanford.edu/housenumbers/, https://www.kaggle.com/c/tiny-imagenet/data.

References

  1. Alex K, Vinod N, Geoffrey H (2009) Learning multiple layers of features from tiny images. Tech. rep, Massachusetts Institute of Technology(MIT), Boston, MA

  2. Anderson M, Magruder J (2012) Learning from the crowd: regression discontinuity estimates of the effects of an online review database*. Econ J 122(563):957–989

    Article  Google Scholar 

  3. Andersoncook CM (2005) Experimental and quasi-experimental designs for generalized causal inference. J Am Stat Assoc 100(470):708–708

    Google Scholar 

  4. Angrist JD, Imbens GW, Rubin DB (1996) Identification of causal effects using instrumental variables. J Am Stat Assoc 91(434):444–455

    Article  Google Scholar 

  5. ARTICLELu Y, Yuan C, Lai Z, Li X, Wong W.K, Zhang D (2017) Nuclear norm-based 2dlpp for image classification. IEEE Trans Multimedia 19(11):2391–2403

    Article  Google Scholar 

  6. Athey S, Imbens GW, Wager S (2018) Approximate residual balancing: debiased inference of average treatment effects in high dimensions. J R Stat Soc Ser B Stat Methodol 80(4):597–623

    Article  MathSciNet  Google Scholar 

  7. Beigman E, Klebanov BB (2009) Learning with annotation noise. In: International joint conference on natural language processing, pp 280–287

  8. Bell S, Bala K (2015) Learning visual similarity for product design with convolutional neural networks. Int Conf Comput Graphics Interact Tech 34(4):98

    Google Scholar 

  9. Bengio Y, Courville A, Vincent P (2013) Representation learning: a review and new perspectives. IEEE Trans Pattern Anal Mach Intell 35(8):1798–1828

    Article  Google Scholar 

  10. Bolei Z, Lapedriza A, Xiao J, Torralba A, Oliva A (2014) Learning deep features for scene recognition using places database. In: Advances in neural information processing systems (NIPS), vol 27

  11. Burton GR (2004) Topics in optimal transportation by cedric villani. Bull Lond Math Soc 36(2):285–286

    Article  Google Scholar 

  12. Chang CH, Adam GA, Goldenberg A (2021) Towards robust classification model by counterfactual and invariant data generation. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (CVPR), pp 15212–15221

  13. Den Oord AV, Vinyals O, Kavukcuoglu K (2017) Neural discrete representation learning. arXiv: Learning

  14. Doersch C (2016) Tutorial on variational autoencoders. Mach Learn

  15. Ge Z, Demyanov S, Garnavi R (2017) Generative openmax for multi-class open set classification. In: British Machine Vision Conference 2017, BMVC 2017, London, UK, September 4–7, 2017. BMVA Press

  16. Gelman A (2010) Causality and statistical learning. Stat Theory

  17. Gelman A, Imbens GW (2019) Why high-order polynomials should not be used in regression discontinuity designs. J Bus Econ Stat 37(3):447–456

    Article  MathSciNet  Google Scholar 

  18. Ghosh A, Kumar H, Sastry PS (2017) Robust loss functions under label noise for deep neural networks. In: National conference on artificial intelligence, pp 1919–1925

  19. Goldberg LR (2019) The book of why: the new science of cause and effect. Not Am Math Soc 66(07):1

    Article  Google Scholar 

  20. Goodfellow IJ, Pouget-Abadie J, Mirza M, Xu B, Warde-Farley D, Ozair S, Courville A, Bengio Y (2014) Generative adversarial networks. arXiv:stat.ML

  21. Guo R, Cheng L, Li J, Hahn PR, Liu H (2018) A survey of learning causality with data: problems and methods. Artif Intell

  22. Hahn PR, Murray JS, Carvalho CM (2017) Bayesian regression tree models for causal inference: regularization, confounding, and heterogeneous effects. Methodology

  23. Hu M, Han H, Shan S, Chen X (2019) Weakly supervised image classification through noise regularization. In: 2019 IEEE/CVF conference on computer vision and pattern recognition (CVPR), pp 11509–11517. https://doi.org/10.1109/CVPR.2019.01178

  24. Huang G, Liu Z, Der Maaten LV, Weinberger KQ (2017) Densely connected convolutional networks. Comput Vis Pattern Recogn 20:2261–2269

    Google Scholar 

  25. Jin J, Li Y, Chen CLP (2021) Pattern classification with corrupted labeling via robust broad learning system. IEEE Trans Knowl Data Eng 20:1–1. https://doi.org/10.1109/TKDE.2021.3049540

    Article  Google Scholar 

  26. Johansson FD, Shalit U, Sontag D (2016) Learning representations for counterfactual inference. In: International conference on machine learning, pp 3020–3029

  27. Kolesnikov A, Beyer L, Zhai X, Puigcerver J, Yung J, Gelly S, Houlsby N (2020) Big transfer (bit): general visual representation learning

  28. Krizhevsky A, Sutskever I, Hinton GE (2017) Imagenet classification with deep convolutional neural networks. Commun ACM 60(6):84–90

    Article  Google Scholar 

  29. Lecun Y, Bottou L, Bengio Y, Haffner P (1998) Gradient-based learning applied to document recognition. Proc IEEE 86(11):2278–2324

    Article  Google Scholar 

  30. Lindeberg T (2012) Scale invariant feature transform. Scholarpedia 7(5):10491

    Article  Google Scholar 

  31. Louizos C, Shalit U, Mooij JM, Sontag D, Zemel RS, Welling M (2017) Causal effect inference with deep latent-variable models. Mach Learn 20:6446–6456

    Google Scholar 

  32. Manwani N, Sastry PS (2013) Noise tolerance under risk minimization. IEEE Trans Syst Man Cybern 43(3):1146–1151

    Google Scholar 

  33. Misra I, Zitnick CL, Mitchell M, Girshick R (2016) Seeing through the human reporting bias: visual classifiers from noisy human-centric labels. Comput Vis Pattern Recogn 20:2930–2939

    Google Scholar 

  34. Natarajan N, Dhillon IS, Ravikumar P, Tewari A (2013) Learning with noisy labels. Neural Inf Process Syst 20:1196–1204

    MATH  Google Scholar 

  35. Neal L, Olson M, Fern X, Wong WK, Li F (2018) Open set learning with counterfactual images. In: European conference on computer vision (ECCV), pp 613–628

  36. Nettleton DF, Orriolspuig A, Fornells A (2010) A study of the effect of different types of noise on the precision of supervised learning techniques. Artif Intell Rev 33(4):275–306

    Article  Google Scholar 

  37. Neufeld E, Pearl J (1993) Probabilistic reasoning in intelligent systems: networks of plausible inference. Series in representation and reasoning. J Symbol Logic 58(02):721–721

    Article  Google Scholar 

  38. Pang Y, Yuan Y, Li X, Pan J (2011) Efficient hog human detection. Signal Process 91(4):773–781

    Article  Google Scholar 

  39. Patrini G, Rozza A, Menon AK, Nock R, Qu L (2017) Making deep neural networks robust to label noise: a loss correction approach. Comput Vis Pattern Recogn 20:2233–2241

    Google Scholar 

  40. Pearl J (2009) Causal inference in statistics: an overview. Sta Surv 3:96–146

    MathSciNet  MATH  Google Scholar 

  41. Pearl J (2009) Causality. Cambridge University Press, Cambridge

    Book  Google Scholar 

  42. Pearl J (2018) Theoretical impediments to machine learning with seven sparks from the causal revolution. Learning

  43. Pearl J, Glymour M, Jewell N (2018) Causal inference in statistics. Wiley, New York

    MATH  Google Scholar 

  44. Pechenizkiy M, Tsymbal A, Puuronen S, Pechenizkiy O (2006) Class noise and supervised learning in medical domains: the effect of feature extraction. Artif Intell Rev 20:708–713

    MATH  Google Scholar 

  45. Pham TT, Shen Y (2017) A deep causal inference approach to measuring the effects of forming group loans in online non-profit microfinance platform. Mach Learn 20:20

    Google Scholar 

  46. Rawat W, Wang Z (2017) Deep convolutional neural networks for image classification: a comprehensive review. Neural Comput 29(9):2352–2449

    Article  MathSciNet  Google Scholar 

  47. Rieger L, Singh C, Murdoch W, Yu B (2020) Interpretations are useful: penalizing explanations to align neural networks with prior knowledge. In: Singh HDA III (eds) Proceedings of the 37th international conference on machine learning, proceedings of machine learning research, vol 119, pp 8116–8126. PMLR

  48. Sagawa S, Koh PW, Hashimoto TB, Liang P (2020) Distributionally robust neural networks for group shifts: On the importance of regularization for worst-case generalization. In: International conference on learning representations

  49. Shalit U, Johansson FD, Sontag D (2016) Estimating individual treatment effect: generalization bounds and algorithms. Mach Learn 200:20

    Google Scholar 

  50. Simonyan K, Zisserman A (2015) Very deep convolutional networks for large-scale image recognition. In: 3rd international conference on learning representations, ICLR 2015, San Diego, CA, USA, May 7–9, 2015, Conference Track Proceedings

  51. Simonyan K, Zisserman A (2015) Very deep convolutional networks for large-scale image recognition. CoRR abs/1409.1556

  52. Szegedy C, Liu W, Jia Y, Sermanet P, Reed S, Anguelov D, Erhan D, Vanhoucke V, Rabinovich A (2015) Going deeper with convolutions. Comput Vis Pattern Recogn 20:1–9

    Google Scholar 

  53. Tolstikhin I, Bousquet O, Gelly S, Schoelkopf B (2017) Wasserstein auto-encoders. Mach Learn 20:210

    Google Scholar 

  54. Wah C, Branson S, Welinder P, Perona P, Belongie S (2011) The caltech-ucsd birds-200-2011 dataset. Tech. rep, MIT

  55. Wang R, Liu T, Tao D (2018) Multiclass learning with partially corrupted labels. IEEE Trans Neural Netw Learn Syst 29(6):2568–2580. https://doi.org/10.1109/TNNLS.2017.2699783

    Article  MathSciNet  Google Scholar 

  56. Weber E, Leuridan B (2008) Counterfactuals and causal inference: methods and principles for social research. Histor Methods 41(4):197–201

    Article  Google Scholar 

  57. Xiao H, Rasul K, Vollgraf R (2017) Fashion-mnist: a novel image dataset for benchmarking machine learning algorithms. Learning

  58. Xiao K, Engstrom L, Ilyas A, Madry A (2020) Noise or signal: The role of image backgrounds in object recognition. arXiv:cs.CV

  59. Ya L, Xuan Y (2015) Tiny imagenet visual recognition challenge. Department of Statistics, Stanford University, Tech. rep

  60. Yuval N, Tao W, Adam C, Alessandro B, Bo W, Andrew YN (2011) Reading digits in natural images with unsupervised feature learning. NIPS workshop on deep learning and unsupervised feature learning

  61. Zhang C, Butepage J, Kjellstrom H, Mandt S (2017) Advances in variational inference. Learning

  62. Zhang X, Li X, Tang Z, Zhang S, Xie S (2020) Noise removal in embedded image with bit approximation. IEEE Trans Knowl Data Eng 20:1

    Google Scholar 

  63. Zhu X, Wu X (2003) Class noise vs. attribute noise: a quantitative study of their impacts. Artif Intell Rev 22(3):177–210

    Article  Google Scholar 

Download references

Acknowledgements

The authors are grateful to all the members from Shanghai University and East China Normal University for their enthusiastic support. Also, the authors would like to thank the financial support of Science and Technology Commission of Shanghai Municipality (Grant No.: 20ZR1416400), NSFC (Grant no.: 61371149).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Fenglei Yang.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Yang, F., Han, J. & Li, B. Deconfounded classification by an intervention approach. Int. J. Mach. Learn. & Cyber. 13, 1763–1779 (2022). https://doi.org/10.1007/s13042-021-01486-3

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s13042-021-01486-3

Keywords

Navigation