Abstract
For an automatic system, image classification is much challenging, partly due to noises that obscure or reduce the clarity of data. Thus, noise suppression has become one of the core tasks of classification. There is often a key assumption in classification that any noise source only affects one side of images and labels. However, this assumption overlooks the confounding scenario that the same noise source affects both images and labels. In this paper, we propose an intervention approach to learn a deconfounded classification model. Classification problem is firstly formulated as causal inference, where intervention is used to untangle the causal from the correlatives, and derive a causal effect formula for deconfounded classification. The WAE (Wasserstein Auto-Encoder) objective is then expanded for classification, with a new regularizer defined for learning unobserved confounders. To build a robust network architecture, a probability factorization is performed in conjunction with d-separation rule to find useful dependency patterns in data. The deconfounded classification model is finally obtained by rearranging the components of the learnt decoder according to the causal effect formula. The experimental results demonstrate our approach outperforms significantly the existing state-of-the-art classification models, particularly on imbalanced data.






Similar content being viewed by others
Explore related subjects
Discover the latest articles and news from researchers in related subjects, suggested using machine learning.Data availability
The *datasets analysed during the current study were derived from the following public domain resources https://www.cs.toronto.edu/~kriz/cifar.html, https://www.worldlink.com.cn/en/osdir/fashion-mnist.html, http://ufldl.stanford.edu/housenumbers/, https://www.kaggle.com/c/tiny-imagenet/data.
References
Alex K, Vinod N, Geoffrey H (2009) Learning multiple layers of features from tiny images. Tech. rep, Massachusetts Institute of Technology(MIT), Boston, MA
Anderson M, Magruder J (2012) Learning from the crowd: regression discontinuity estimates of the effects of an online review database*. Econ J 122(563):957–989
Andersoncook CM (2005) Experimental and quasi-experimental designs for generalized causal inference. J Am Stat Assoc 100(470):708–708
Angrist JD, Imbens GW, Rubin DB (1996) Identification of causal effects using instrumental variables. J Am Stat Assoc 91(434):444–455
ARTICLELu Y, Yuan C, Lai Z, Li X, Wong W.K, Zhang D (2017) Nuclear norm-based 2dlpp for image classification. IEEE Trans Multimedia 19(11):2391–2403
Athey S, Imbens GW, Wager S (2018) Approximate residual balancing: debiased inference of average treatment effects in high dimensions. J R Stat Soc Ser B Stat Methodol 80(4):597–623
Beigman E, Klebanov BB (2009) Learning with annotation noise. In: International joint conference on natural language processing, pp 280–287
Bell S, Bala K (2015) Learning visual similarity for product design with convolutional neural networks. Int Conf Comput Graphics Interact Tech 34(4):98
Bengio Y, Courville A, Vincent P (2013) Representation learning: a review and new perspectives. IEEE Trans Pattern Anal Mach Intell 35(8):1798–1828
Bolei Z, Lapedriza A, Xiao J, Torralba A, Oliva A (2014) Learning deep features for scene recognition using places database. In: Advances in neural information processing systems (NIPS), vol 27
Burton GR (2004) Topics in optimal transportation by cedric villani. Bull Lond Math Soc 36(2):285–286
Chang CH, Adam GA, Goldenberg A (2021) Towards robust classification model by counterfactual and invariant data generation. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (CVPR), pp 15212–15221
Den Oord AV, Vinyals O, Kavukcuoglu K (2017) Neural discrete representation learning. arXiv: Learning
Doersch C (2016) Tutorial on variational autoencoders. Mach Learn
Ge Z, Demyanov S, Garnavi R (2017) Generative openmax for multi-class open set classification. In: British Machine Vision Conference 2017, BMVC 2017, London, UK, September 4–7, 2017. BMVA Press
Gelman A (2010) Causality and statistical learning. Stat Theory
Gelman A, Imbens GW (2019) Why high-order polynomials should not be used in regression discontinuity designs. J Bus Econ Stat 37(3):447–456
Ghosh A, Kumar H, Sastry PS (2017) Robust loss functions under label noise for deep neural networks. In: National conference on artificial intelligence, pp 1919–1925
Goldberg LR (2019) The book of why: the new science of cause and effect. Not Am Math Soc 66(07):1
Goodfellow IJ, Pouget-Abadie J, Mirza M, Xu B, Warde-Farley D, Ozair S, Courville A, Bengio Y (2014) Generative adversarial networks. arXiv:stat.ML
Guo R, Cheng L, Li J, Hahn PR, Liu H (2018) A survey of learning causality with data: problems and methods. Artif Intell
Hahn PR, Murray JS, Carvalho CM (2017) Bayesian regression tree models for causal inference: regularization, confounding, and heterogeneous effects. Methodology
Hu M, Han H, Shan S, Chen X (2019) Weakly supervised image classification through noise regularization. In: 2019 IEEE/CVF conference on computer vision and pattern recognition (CVPR), pp 11509–11517. https://doi.org/10.1109/CVPR.2019.01178
Huang G, Liu Z, Der Maaten LV, Weinberger KQ (2017) Densely connected convolutional networks. Comput Vis Pattern Recogn 20:2261–2269
Jin J, Li Y, Chen CLP (2021) Pattern classification with corrupted labeling via robust broad learning system. IEEE Trans Knowl Data Eng 20:1–1. https://doi.org/10.1109/TKDE.2021.3049540
Johansson FD, Shalit U, Sontag D (2016) Learning representations for counterfactual inference. In: International conference on machine learning, pp 3020–3029
Kolesnikov A, Beyer L, Zhai X, Puigcerver J, Yung J, Gelly S, Houlsby N (2020) Big transfer (bit): general visual representation learning
Krizhevsky A, Sutskever I, Hinton GE (2017) Imagenet classification with deep convolutional neural networks. Commun ACM 60(6):84–90
Lecun Y, Bottou L, Bengio Y, Haffner P (1998) Gradient-based learning applied to document recognition. Proc IEEE 86(11):2278–2324
Lindeberg T (2012) Scale invariant feature transform. Scholarpedia 7(5):10491
Louizos C, Shalit U, Mooij JM, Sontag D, Zemel RS, Welling M (2017) Causal effect inference with deep latent-variable models. Mach Learn 20:6446–6456
Manwani N, Sastry PS (2013) Noise tolerance under risk minimization. IEEE Trans Syst Man Cybern 43(3):1146–1151
Misra I, Zitnick CL, Mitchell M, Girshick R (2016) Seeing through the human reporting bias: visual classifiers from noisy human-centric labels. Comput Vis Pattern Recogn 20:2930–2939
Natarajan N, Dhillon IS, Ravikumar P, Tewari A (2013) Learning with noisy labels. Neural Inf Process Syst 20:1196–1204
Neal L, Olson M, Fern X, Wong WK, Li F (2018) Open set learning with counterfactual images. In: European conference on computer vision (ECCV), pp 613–628
Nettleton DF, Orriolspuig A, Fornells A (2010) A study of the effect of different types of noise on the precision of supervised learning techniques. Artif Intell Rev 33(4):275–306
Neufeld E, Pearl J (1993) Probabilistic reasoning in intelligent systems: networks of plausible inference. Series in representation and reasoning. J Symbol Logic 58(02):721–721
Pang Y, Yuan Y, Li X, Pan J (2011) Efficient hog human detection. Signal Process 91(4):773–781
Patrini G, Rozza A, Menon AK, Nock R, Qu L (2017) Making deep neural networks robust to label noise: a loss correction approach. Comput Vis Pattern Recogn 20:2233–2241
Pearl J (2009) Causal inference in statistics: an overview. Sta Surv 3:96–146
Pearl J (2009) Causality. Cambridge University Press, Cambridge
Pearl J (2018) Theoretical impediments to machine learning with seven sparks from the causal revolution. Learning
Pearl J, Glymour M, Jewell N (2018) Causal inference in statistics. Wiley, New York
Pechenizkiy M, Tsymbal A, Puuronen S, Pechenizkiy O (2006) Class noise and supervised learning in medical domains: the effect of feature extraction. Artif Intell Rev 20:708–713
Pham TT, Shen Y (2017) A deep causal inference approach to measuring the effects of forming group loans in online non-profit microfinance platform. Mach Learn 20:20
Rawat W, Wang Z (2017) Deep convolutional neural networks for image classification: a comprehensive review. Neural Comput 29(9):2352–2449
Rieger L, Singh C, Murdoch W, Yu B (2020) Interpretations are useful: penalizing explanations to align neural networks with prior knowledge. In: Singh HDA III (eds) Proceedings of the 37th international conference on machine learning, proceedings of machine learning research, vol 119, pp 8116–8126. PMLR
Sagawa S, Koh PW, Hashimoto TB, Liang P (2020) Distributionally robust neural networks for group shifts: On the importance of regularization for worst-case generalization. In: International conference on learning representations
Shalit U, Johansson FD, Sontag D (2016) Estimating individual treatment effect: generalization bounds and algorithms. Mach Learn 200:20
Simonyan K, Zisserman A (2015) Very deep convolutional networks for large-scale image recognition. In: 3rd international conference on learning representations, ICLR 2015, San Diego, CA, USA, May 7–9, 2015, Conference Track Proceedings
Simonyan K, Zisserman A (2015) Very deep convolutional networks for large-scale image recognition. CoRR abs/1409.1556
Szegedy C, Liu W, Jia Y, Sermanet P, Reed S, Anguelov D, Erhan D, Vanhoucke V, Rabinovich A (2015) Going deeper with convolutions. Comput Vis Pattern Recogn 20:1–9
Tolstikhin I, Bousquet O, Gelly S, Schoelkopf B (2017) Wasserstein auto-encoders. Mach Learn 20:210
Wah C, Branson S, Welinder P, Perona P, Belongie S (2011) The caltech-ucsd birds-200-2011 dataset. Tech. rep, MIT
Wang R, Liu T, Tao D (2018) Multiclass learning with partially corrupted labels. IEEE Trans Neural Netw Learn Syst 29(6):2568–2580. https://doi.org/10.1109/TNNLS.2017.2699783
Weber E, Leuridan B (2008) Counterfactuals and causal inference: methods and principles for social research. Histor Methods 41(4):197–201
Xiao H, Rasul K, Vollgraf R (2017) Fashion-mnist: a novel image dataset for benchmarking machine learning algorithms. Learning
Xiao K, Engstrom L, Ilyas A, Madry A (2020) Noise or signal: The role of image backgrounds in object recognition. arXiv:cs.CV
Ya L, Xuan Y (2015) Tiny imagenet visual recognition challenge. Department of Statistics, Stanford University, Tech. rep
Yuval N, Tao W, Adam C, Alessandro B, Bo W, Andrew YN (2011) Reading digits in natural images with unsupervised feature learning. NIPS workshop on deep learning and unsupervised feature learning
Zhang C, Butepage J, Kjellstrom H, Mandt S (2017) Advances in variational inference. Learning
Zhang X, Li X, Tang Z, Zhang S, Xie S (2020) Noise removal in embedded image with bit approximation. IEEE Trans Knowl Data Eng 20:1
Zhu X, Wu X (2003) Class noise vs. attribute noise: a quantitative study of their impacts. Artif Intell Rev 22(3):177–210
Acknowledgements
The authors are grateful to all the members from Shanghai University and East China Normal University for their enthusiastic support. Also, the authors would like to thank the financial support of Science and Technology Commission of Shanghai Municipality (Grant No.: 20ZR1416400), NSFC (Grant no.: 61371149).
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that they have no conflict of interest.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Yang, F., Han, J. & Li, B. Deconfounded classification by an intervention approach. Int. J. Mach. Learn. & Cyber. 13, 1763–1779 (2022). https://doi.org/10.1007/s13042-021-01486-3
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s13042-021-01486-3