Robust Class-Specific Autoencoder for Data Cleaning and Classification in the Presence of Label Noise

Zhang, Weining; Wang, Dong; Tan, Xiaoyang

doi:10.1007/s11063-018-9963-9

Robust Class-Specific Autoencoder for Data Cleaning and Classification in the Presence of Label Noise

Published: 14 December 2018

Volume 50, pages 1845–1860, (2019)
Cite this article

Neural Processing Letters Aims and scope Submit manuscript

Weining Zhang^1,2,
Dong Wang^1,2 &
Xiaoyang Tan^1,2

700 Accesses
14 Citations
Explore all metrics

Abstract

We present a simple but effective method for data cleaning and classification in the presence of label noise. The fundamental idea is to treat the data points with label noise as outliers of the class indicated by the corresponding noisy label. This essentially allows us to deal with the traditional supervised problem of classification with label noise as an unsupervised one, i.e., identifying outliers from each class. However, finding such dubious observations (outliers) from each class is challenging in general. We therefore propose to reduce their potential influence using class-specific feature learning by autoencoder. Particularly, we learn for each class a feature space using all the samples labeled as that class, including those with noisy (but unknown to us) labels. Furthermore, in order to solve the situation when the noise is relatively high, we propose a weighted class-specific autoencoder by considering the effect of each data point on the postulated model. To fully exploit the advantage of the learned class-specific feature space, we use a minimum reconstruction error based method for finding out the outliers (label noise) and solving the classification task. Experiments on several datasets show that the proposed method achieves state of the art performance on the task of data cleaning and classification with noisy labels.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Data Cleaning and Classification in the Presence of Label Noise with Class-Specific Autoencoder

Identifying Mislabeled Images in Supervised Learning Utilizing Autoencoder

Missing multi-label learning with non-equilibrium based on two-level autoencoder

Article 22 February 2021

Notes

In general, the nature of outliers can be hard to grasp and actually there is no unanimous definition of outlier in literature [13, 37].

References

Abellán J, Masegosa AR (2010) Bagging decision trees on data sets with classification noise. In: International symposium on foundations of information and knowledge systems. Springer, pp 248–265
Aggarwal CC (ed) (2015) Outlier analysis. In: Data mining. Springer, Berlin, pp 237–263
Bengio Y, Lamblin P, Popovici D, Larochelle H (2007) Greedy layer-wise training of deep networks. In: Advances in neural information processing systems, pp 153–160
Biggio B, Nelson B, Laskov P (2011) Support vector machines under adversarial label noise. ACML 20:97–112
Google Scholar
Brodley CE, Friedl MA (1999) Identifying mislabeled training data. J Artif Intell Res 11:131–167
Article Google Scholar
Chandola V, Banerjee A, Kumar V (2009) Anomaly detection: a survey. ACM Comput Surv (CSUR) 41(3):15
Article Google Scholar
Ekambaram R, Fefilatyev S, Shreve M, Kramer K, Hall LO, Goldgof DB, Kasturi R (2016) Active cleaning of label noise. Pattern Recognit 51:463–480
Article Google Scholar
Fefilatyev S, Shreve M, Kramer K, Hall L, Goldgof D, Kasturi R, Daly K, Remsen A, Bunke H (2012) Label-noise reduction with support vector machines. In: 2012 21st International Conference on Pattern Recognition (ICPR). IEEE, pp 3504–3508
Frénay B, Verleysen M (2014) Classification in the presence of label noise: a survey. IEEE Trans Neural Netw Learn Syst 25(5):845–869
Article Google Scholar
Gupta K, Majumdar A (2017) Imposing class-wise feature similarity in stacked autoencoders by nuclear norm regularization. Neural Process Lett 48:1–15
Google Scholar
Hawkins DM (1980) Identification of outliers, vol 11. Springer, Berlin
Book Google Scholar
Hoz EDL, Hoz EDL, Ortiz A, Ortega J, Martnez-lvarez A (2014) Feature selection by multi-objective optimisation: application to network anomaly detection by hierarchical self-organising maps. Knowl Based Syst 71:322–338
Article Google Scholar
Huber PJ (2011) Robust statistics. Springer, Berlin
Google Scholar
Ipeirotis PG, Provost F, Wang J (2010) Quality management on amazon mechanical turk. In: Proceedings of the ACM SIGKDD workshop on human computation. ACM, pp 64–67
Jeatrakul P, Wong KW, Fung CC (2010) Data cleaning for classification using misclassification analysis. J Adv Comput Intell Intell Inform 14(3):297–302
Article Google Scholar
Kamimura R, Nakanishi S (1995) Feature detectors by autoencoders: decomposition of input patterns into atomic features by neural networks. Neural Process Lett 2(6):17–22
Article Google Scholar
Kingma DP, Welling M (2013) Auto-encoding variational bayes. arXiv preprint arXiv:1312.6114
Krishna RA, Hata K, Chen S, Kravitz J, Shamma DA, Fei-Fei L, Bernstein MS (2016) Embracing error to enable rapid crowdsourcing. In: Proceedings of the 2016 CHI conference on human factors in computing systems. ACM, pp 3167–3179
Lab R, Gunnar Rtsch PD (2001) Soft margins for adaboost. Mach Learn 42(3):287–320
Article Google Scholar
LeCun Y, Bottou L, Bengio Y, Haffner P (1998) Gradient-based learning applied to document recognition. Proc IEEE 86(11):2278–2324
Article Google Scholar
Li W, Wang L, Li W, Agustsson E, Van Gool L (2017) Webvision database: visual learning and understanding from web data. arXiv preprint arXiv:1708.02862
Liu T, Tao D (2016) Classification with noisy labels by importance reweighting. IEEE Trans Pattern Anal Mach Intell 38(3):447–461
Article Google Scholar
Makhzani A, Frey B (2013) K-sparse autoencoders. arXiv preprint arXiv:1312.5663
Maria J, Amaro J, Falcao G, Alexandre LA (2016) Stacked autoencoders using low-power accelerated architectures for object recognition in autonomous systems. Neural Process Lett 43(2):445–458
Article Google Scholar
Natarajan N, Dhillon IS, Ravikumar PK, Tewari A (2013) Learning with noisy labels. In: Advances in neural information processing systems, pp 1196–1204
Nettleton DF, Orriols-Puig A, Fornells A (2010) A study of the effect of different types of noise on the precision of supervised learning techniques. Artif Intell Rev 33(4):275–306
Article Google Scholar
Pechenizkiy M, Tsymbal A, Puuronen S, Pechenizkiy O (2006) Class noise and supervised learning in medical domains: the effect of feature extraction. In: 19th IEEE international symposium on computer-based medical systems. CBMS 2006. IEEE, pp 708–713
Pruengkarn R, Wong KW, Fung CC (2016) Data cleaning using complementary fuzzy support vector machine technique. In: International conference on neural information processing. Springer, pp 160–167
Qian Q, Hu J, Jin R, Pei J, Zhu S (2014) Distance metric learning using dropout: a structured regularization approach. In: Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, pp 323–332
Rebbapragada UD (2010) Strategic targeting of outliers for expert review. Ph.D. thesis, Tufts University
Rifai S, Vincent P, Muller X, Glorot X, Bengio Y (2011) Contractive auto-encoders: explicit invariance during feature extraction. In: Proceedings of the 28th international conference on machine learning (ICML-11), pp 833–840
Rolnick D, Veit A, Belongie S, Shavit N (2017) Deep learning is robust to massive label noise. arXiv preprint arXiv:1705.10694
Rtsch G, Schlkopf B, Smola AJ, Mika S, Onoda T, Mller KR (2000) Robust ensemble learning for data mining. In: Pacific-Asia conference on knowledge discovery and data mining, Current Issues and New Applications, pp 341–344
Sáez JA, Galar M, Luengo J, Herrera F (2014) Analyzing the presence of noise in multi-class problems: alleviating its influence with the one-vs-one decomposition. Knowl Inf Syst 38(1):179–206
Article Google Scholar
Teng CM (2005) Dealing with data corruption in remote sensing. In: International conference on advances in intelligent data analysis, pp 452–463
Vidal R, Ma Y, Sastry S (2005) Generalized principal component analysis (GPCA). IEEE Trans Pattern Anal Mach Intell 27(12):1945–1959
Article Google Scholar
Vidal R, Ma Y, Sastry SS (2016) Robust principal component analysis. In: Antman SS (ed) Generalized Principal Component Analysis. Springer, Berlin pp 63–122
Vincent P, Larochelle H, Bengio Y, Manzagol PA (2008) Extracting and composing robust features with denoising autoencoders. In: Proceedings of the 25th international conference on machine learning. ACM, pp 1096–1103
Vincent P, Larochelle H, Lajoie I, Bengio Y, Manzagol PA (2010) Stacked denoising autoencoders: learning useful representations in a deep network with a local denoising criterion. J Mach Learn Res 11:3371–3408
MathSciNet MATH Google Scholar
Wang D, Tan X (2014) Robust distance metric learning in the presence of label noise. In: AAAI, pp 1321–1327
Wang H, Nie F, Huang H (2014) Robust distance metric learning via simultaneous l1-norm minimization and maximization. In: International conference on machine learning, pp 1836–1844
Yang L, Jin R, Sukthankar R (2012) Bayesian active distance metric learning. arXiv preprint arXiv:1206.5283
Yang T, Mahdavi M, Jin R, Zhang L, Zhou Y (2012) Multiple kernel learning from noisy labels by stochastic programming. arXiv preprint arXiv:1206.4629
Zhang W, Rekaya R, Bertrand K (2005) A method for predicting disease subtypes in presence of misclassification among training samples using gene expression: application to human breast cancer. Bioinformatics 22(3):317–325
Article Google Scholar
Zhang W, Wang D, Tan X (2018) Data cleaning and classification in the presence of label noise with class-specific autoencoder. In: International symposium on neural networks
Zhu X, Wu X (2004) Class noise vs. attribute noise: a quantitative study. Artif Intell Rev 22(3):177–210

Download references

Acknowledgements

The authors thank the anonymous reviewers for their valuable comments and suggestions. This work is partially supported by National Science Foundation of China (61672280, 61373060, 61732006), AI+ Project of NUAA (56XZA18009), Jiangsu 333 Project (BRA2017377) and Qing Lan Project.

Author information

Authors and Affiliations

College of Computer Science and Technology, Nanjing University of Aeronautics and Astronautics, Nanjing, 211106, Jiangsu, China
Weining Zhang, Dong Wang & Xiaoyang Tan
Collaborative Innovation Center of Novel Software Technology and Industrialization, Nanjing, China
Weining Zhang, Dong Wang & Xiaoyang Tan

Authors

Weining Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Dong Wang
View author publications
You can also search for this author in PubMed Google Scholar
Xiaoyang Tan
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Xiaoyang Tan.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

This is an extended version of the paper published in ISNN 2018, [45].

Rights and permissions

Reprints and permissions

About this article

Cite this article

Zhang, W., Wang, D. & Tan, X. Robust Class-Specific Autoencoder for Data Cleaning and Classification in the Presence of Label Noise. Neural Process Lett 50, 1845–1860 (2019). https://doi.org/10.1007/s11063-018-9963-9

Download citation

Published: 14 December 2018
Issue Date: October 2019
DOI: https://doi.org/10.1007/s11063-018-9963-9

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Robust Class-Specific Autoencoder for Data Cleaning and Classification in the Presence of Label Noise

Abstract

Access this article

Similar content being viewed by others

Data Cleaning and Classification in the Presence of Label Noise with Class-Specific Autoencoder

Identifying Mislabeled Images in Supervised Learning Utilizing Autoencoder

Missing multi-label learning with non-equilibrium based on two-level autoencoder

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Robust Class-Specific Autoencoder for Data Cleaning and Classification in the Presence of Label Noise

Abstract

Access this article

Similar content being viewed by others

Data Cleaning and Classification in the Presence of Label Noise with Class-Specific Autoencoder

Identifying Mislabeled Images in Supervised Learning Utilizing Autoencoder

Missing multi-label learning with non-equilibrium based on two-level autoencoder

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation