Deconfounded classification by an intervention approach

Yang, Fenglei; Han, Jingling; Li, Baomin

doi:10.1007/s13042-021-01486-3

Deconfounded classification by an intervention approach

Original Article
Published: 17 January 2022

Volume 13, pages 1763–1779, (2022)
Cite this article

International Journal of Machine Learning and Cybernetics Aims and scope Submit manuscript

351 Accesses
1 Altmetric
Explore all metrics

Abstract

For an automatic system, image classification is much challenging, partly due to noises that obscure or reduce the clarity of data. Thus, noise suppression has become one of the core tasks of classification. There is often a key assumption in classification that any noise source only affects one side of images and labels. However, this assumption overlooks the confounding scenario that the same noise source affects both images and labels. In this paper, we propose an intervention approach to learn a deconfounded classification model. Classification problem is firstly formulated as causal inference, where intervention is used to untangle the causal from the correlatives, and derive a causal effect formula for deconfounded classification. The WAE (Wasserstein Auto-Encoder) objective is then expanded for classification, with a new regularizer defined for learning unobserved confounders. To build a robust network architecture, a probability factorization is performed in conjunction with d-separation rule to find useful dependency patterns in data. The deconfounded classification model is finally obtained by rearranging the components of the learnt decoder according to the causal effect formula. The experimental results demonstrate our approach outperforms significantly the existing state-of-the-art classification models, particularly on imbalanced data.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Classification of Incomplete Data Using Autoencoder and Evidential Reasoning

From Imbalanced Classification to Supervised Outlier Detection Problems: Adversarially Trained Auto Encoders

Discriminative Representation Learning with Supervised Auto-encoder

Article 05 April 2018

Discover the latest articles and news from researchers in related subjects, suggested using machine learning.

Artificial Intelligence

Data availability

The *datasets analysed during the current study were derived from the following public domain resources https://www.cs.toronto.edu/~kriz/cifar.html, https://www.worldlink.com.cn/en/osdir/fashion-mnist.html, http://ufldl.stanford.edu/housenumbers/, https://www.kaggle.com/c/tiny-imagenet/data.

References

Alex K, Vinod N, Geoffrey H (2009) Learning multiple layers of features from tiny images. Tech. rep, Massachusetts Institute of Technology(MIT), Boston, MA
Anderson M, Magruder J (2012) Learning from the crowd: regression discontinuity estimates of the effects of an online review database*. Econ J 122(563):957–989
Article Google Scholar
Andersoncook CM (2005) Experimental and quasi-experimental designs for generalized causal inference. J Am Stat Assoc 100(470):708–708
Google Scholar
Angrist JD, Imbens GW, Rubin DB (1996) Identification of causal effects using instrumental variables. J Am Stat Assoc 91(434):444–455
Article Google Scholar
ARTICLELu Y, Yuan C, Lai Z, Li X, Wong W.K, Zhang D (2017) Nuclear norm-based 2dlpp for image classification. IEEE Trans Multimedia 19(11):2391–2403
Article Google Scholar
Athey S, Imbens GW, Wager S (2018) Approximate residual balancing: debiased inference of average treatment effects in high dimensions. J R Stat Soc Ser B Stat Methodol 80(4):597–623
Article MathSciNet Google Scholar
Beigman E, Klebanov BB (2009) Learning with annotation noise. In: International joint conference on natural language processing, pp 280–287
Bell S, Bala K (2015) Learning visual similarity for product design with convolutional neural networks. Int Conf Comput Graphics Interact Tech 34(4):98
Google Scholar
Bengio Y, Courville A, Vincent P (2013) Representation learning: a review and new perspectives. IEEE Trans Pattern Anal Mach Intell 35(8):1798–1828
Article Google Scholar
Bolei Z, Lapedriza A, Xiao J, Torralba A, Oliva A (2014) Learning deep features for scene recognition using places database. In: Advances in neural information processing systems (NIPS), vol 27
Burton GR (2004) Topics in optimal transportation by cedric villani. Bull Lond Math Soc 36(2):285–286
Article Google Scholar
Chang CH, Adam GA, Goldenberg A (2021) Towards robust classification model by counterfactual and invariant data generation. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (CVPR), pp 15212–15221
Den Oord AV, Vinyals O, Kavukcuoglu K (2017) Neural discrete representation learning. arXiv: Learning
Doersch C (2016) Tutorial on variational autoencoders. Mach Learn
Ge Z, Demyanov S, Garnavi R (2017) Generative openmax for multi-class open set classification. In: British Machine Vision Conference 2017, BMVC 2017, London, UK, September 4–7, 2017. BMVA Press
Gelman A (2010) Causality and statistical learning. Stat Theory
Gelman A, Imbens GW (2019) Why high-order polynomials should not be used in regression discontinuity designs. J Bus Econ Stat 37(3):447–456
Article MathSciNet Google Scholar
Ghosh A, Kumar H, Sastry PS (2017) Robust loss functions under label noise for deep neural networks. In: National conference on artificial intelligence, pp 1919–1925
Goldberg LR (2019) The book of why: the new science of cause and effect. Not Am Math Soc 66(07):1
Article Google Scholar
Goodfellow IJ, Pouget-Abadie J, Mirza M, Xu B, Warde-Farley D, Ozair S, Courville A, Bengio Y (2014) Generative adversarial networks. arXiv:stat.ML
Guo R, Cheng L, Li J, Hahn PR, Liu H (2018) A survey of learning causality with data: problems and methods. Artif Intell
Hahn PR, Murray JS, Carvalho CM (2017) Bayesian regression tree models for causal inference: regularization, confounding, and heterogeneous effects. Methodology
Hu M, Han H, Shan S, Chen X (2019) Weakly supervised image classification through noise regularization. In: 2019 IEEE/CVF conference on computer vision and pattern recognition (CVPR), pp 11509–11517. https://doi.org/10.1109/CVPR.2019.01178
Huang G, Liu Z, Der Maaten LV, Weinberger KQ (2017) Densely connected convolutional networks. Comput Vis Pattern Recogn 20:2261–2269
Google Scholar
Jin J, Li Y, Chen CLP (2021) Pattern classification with corrupted labeling via robust broad learning system. IEEE Trans Knowl Data Eng 20:1–1. https://doi.org/10.1109/TKDE.2021.3049540
Article Google Scholar
Johansson FD, Shalit U, Sontag D (2016) Learning representations for counterfactual inference. In: International conference on machine learning, pp 3020–3029
Kolesnikov A, Beyer L, Zhai X, Puigcerver J, Yung J, Gelly S, Houlsby N (2020) Big transfer (bit): general visual representation learning
Krizhevsky A, Sutskever I, Hinton GE (2017) Imagenet classification with deep convolutional neural networks. Commun ACM 60(6):84–90
Article Google Scholar
Lecun Y, Bottou L, Bengio Y, Haffner P (1998) Gradient-based learning applied to document recognition. Proc IEEE 86(11):2278–2324
Article Google Scholar
Lindeberg T (2012) Scale invariant feature transform. Scholarpedia 7(5):10491
Article Google Scholar
Louizos C, Shalit U, Mooij JM, Sontag D, Zemel RS, Welling M (2017) Causal effect inference with deep latent-variable models. Mach Learn 20:6446–6456
Google Scholar
Manwani N, Sastry PS (2013) Noise tolerance under risk minimization. IEEE Trans Syst Man Cybern 43(3):1146–1151
Google Scholar
Misra I, Zitnick CL, Mitchell M, Girshick R (2016) Seeing through the human reporting bias: visual classifiers from noisy human-centric labels. Comput Vis Pattern Recogn 20:2930–2939
Google Scholar
Natarajan N, Dhillon IS, Ravikumar P, Tewari A (2013) Learning with noisy labels. Neural Inf Process Syst 20:1196–1204
MATH Google Scholar
Neal L, Olson M, Fern X, Wong WK, Li F (2018) Open set learning with counterfactual images. In: European conference on computer vision (ECCV), pp 613–628
Nettleton DF, Orriolspuig A, Fornells A (2010) A study of the effect of different types of noise on the precision of supervised learning techniques. Artif Intell Rev 33(4):275–306
Article Google Scholar
Neufeld E, Pearl J (1993) Probabilistic reasoning in intelligent systems: networks of plausible inference. Series in representation and reasoning. J Symbol Logic 58(02):721–721
Article Google Scholar
Pang Y, Yuan Y, Li X, Pan J (2011) Efficient hog human detection. Signal Process 91(4):773–781
Article Google Scholar
Patrini G, Rozza A, Menon AK, Nock R, Qu L (2017) Making deep neural networks robust to label noise: a loss correction approach. Comput Vis Pattern Recogn 20:2233–2241
Google Scholar
Pearl J (2009) Causal inference in statistics: an overview. Sta Surv 3:96–146
MathSciNet MATH Google Scholar
Pearl J (2009) Causality. Cambridge University Press, Cambridge
Book Google Scholar
Pearl J (2018) Theoretical impediments to machine learning with seven sparks from the causal revolution. Learning
Pearl J, Glymour M, Jewell N (2018) Causal inference in statistics. Wiley, New York
MATH Google Scholar
Pechenizkiy M, Tsymbal A, Puuronen S, Pechenizkiy O (2006) Class noise and supervised learning in medical domains: the effect of feature extraction. Artif Intell Rev 20:708–713
MATH Google Scholar
Pham TT, Shen Y (2017) A deep causal inference approach to measuring the effects of forming group loans in online non-profit microfinance platform. Mach Learn 20:20
Google Scholar
Rawat W, Wang Z (2017) Deep convolutional neural networks for image classification: a comprehensive review. Neural Comput 29(9):2352–2449
Article MathSciNet Google Scholar
Rieger L, Singh C, Murdoch W, Yu B (2020) Interpretations are useful: penalizing explanations to align neural networks with prior knowledge. In: Singh HDA III (eds) Proceedings of the 37th international conference on machine learning, proceedings of machine learning research, vol 119, pp 8116–8126. PMLR
Sagawa S, Koh PW, Hashimoto TB, Liang P (2020) Distributionally robust neural networks for group shifts: On the importance of regularization for worst-case generalization. In: International conference on learning representations
Shalit U, Johansson FD, Sontag D (2016) Estimating individual treatment effect: generalization bounds and algorithms. Mach Learn 200:20
Google Scholar
Simonyan K, Zisserman A (2015) Very deep convolutional networks for large-scale image recognition. In: 3rd international conference on learning representations, ICLR 2015, San Diego, CA, USA, May 7–9, 2015, Conference Track Proceedings
Simonyan K, Zisserman A (2015) Very deep convolutional networks for large-scale image recognition. CoRR abs/1409.1556
Szegedy C, Liu W, Jia Y, Sermanet P, Reed S, Anguelov D, Erhan D, Vanhoucke V, Rabinovich A (2015) Going deeper with convolutions. Comput Vis Pattern Recogn 20:1–9
Google Scholar
Tolstikhin I, Bousquet O, Gelly S, Schoelkopf B (2017) Wasserstein auto-encoders. Mach Learn 20:210
Google Scholar
Wah C, Branson S, Welinder P, Perona P, Belongie S (2011) The caltech-ucsd birds-200-2011 dataset. Tech. rep, MIT
Wang R, Liu T, Tao D (2018) Multiclass learning with partially corrupted labels. IEEE Trans Neural Netw Learn Syst 29(6):2568–2580. https://doi.org/10.1109/TNNLS.2017.2699783
Article MathSciNet Google Scholar
Weber E, Leuridan B (2008) Counterfactuals and causal inference: methods and principles for social research. Histor Methods 41(4):197–201
Article Google Scholar
Xiao H, Rasul K, Vollgraf R (2017) Fashion-mnist: a novel image dataset for benchmarking machine learning algorithms. Learning
Xiao K, Engstrom L, Ilyas A, Madry A (2020) Noise or signal: The role of image backgrounds in object recognition. arXiv:cs.CV
Ya L, Xuan Y (2015) Tiny imagenet visual recognition challenge. Department of Statistics, Stanford University, Tech. rep
Yuval N, Tao W, Adam C, Alessandro B, Bo W, Andrew YN (2011) Reading digits in natural images with unsupervised feature learning. NIPS workshop on deep learning and unsupervised feature learning
Zhang C, Butepage J, Kjellstrom H, Mandt S (2017) Advances in variational inference. Learning
Zhang X, Li X, Tang Z, Zhang S, Xie S (2020) Noise removal in embedded image with bit approximation. IEEE Trans Knowl Data Eng 20:1
Google Scholar
Zhu X, Wu X (2003) Class noise vs. attribute noise: a quantitative study of their impacts. Artif Intell Rev 22(3):177–210
Article Google Scholar

Download references

Acknowledgements

The authors are grateful to all the members from Shanghai University and East China Normal University for their enthusiastic support. Also, the authors would like to thank the financial support of Science and Technology Commission of Shanghai Municipality (Grant No.: 20ZR1416400), NSFC (Grant no.: 61371149).

Author information

Authors and Affiliations

School of Computer Engineering and Science, Shanghai University, Shanghai, China
Fenglei Yang & Jingling Han
School of Open Learning and Education, East China Normal University, Shanghai, China
Baomin Li

Authors

Fenglei Yang
View author publications
You can also search for this author inPubMed Google Scholar
Jingling Han
View author publications
You can also search for this author inPubMed Google Scholar
Baomin Li
View author publications
You can also search for this author inPubMed Google Scholar

Corresponding author

Correspondence to Fenglei Yang.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Yang, F., Han, J. & Li, B. Deconfounded classification by an intervention approach. Int. J. Mach. Learn. & Cyber. 13, 1763–1779 (2022). https://doi.org/10.1007/s13042-021-01486-3

Download citation

Received: 19 December 2020
Accepted: 25 November 2021
Published: 17 January 2022
Issue Date: June 2022
DOI: https://doi.org/10.1007/s13042-021-01486-3

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Deconfounded classification by an intervention approach

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Classification of Incomplete Data Using Autoencoder and Evidential Reasoning

From Imbalanced Classification to Supervised Outlier Detection Problems: Adversarially Trained Auto Encoders

Discriminative Representation Learning with Supervised Auto-encoder

Explore related subjects

Data availability

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now