ABSTRACT
Recent years have seen an increase in data availability and computational power, which has led to superior performance in training deep learning models for image classification. In many real-world use-cases, however, the training datasets come with noisy labels that have been automatically generated for a small subset of the available data. In this paper we explore strategies for maintaining classification performance when the labels become noisy and sparse. In particular, we evaluate the effectiveness of combining PENCIL, a framework for correcting noisy labels during training and AMDIM, a self-supervised technique for learning good data representations from unlabeled data. We find that this combination is significantly more effective when dealing with sparse and noisy labels, compared to using either of these approaches alone.
- Görkem Algan and Ilkay Ulusoy. 2019. Image Classification with Deep Learning in the Presence of Noisy Labels: A Survey. arXiv:1912.05170 [cs, stat] (Dec. 2019). arxiv:1912.05170 [cs, stat]Google Scholar
- Devansh Arpit, Stanisł aw Jastrzębski, Nicolas Ballas, David Krueger, Emmanuel Bengio, Maxinder S. Kanwal, Tegan Maharaj, Asja Fischer, Aaron Courville, Yoshua Bengio, and Simon Lacoste-Julien. 2017. A Closer Look at Memorization in Deep Networks. arXiv:1706.05394 [cs, stat] (July 2017). arxiv:1706.05394 [cs, stat]Google Scholar
- Philip Bachman, R Devon Hjelm, and William Buchwalter. [n.d.]. Learning Representations by Maximizing Mutual Information Across Views. ([n. d.]), 11.Google Scholar
- Benoit Frenay and Michel Verleysen. 2014. Classification in the Presence of Label Noise: A Survey. IEEE Transactions on Neural Networks and Learning Systems 25, 5 (May 2014), 845–869. https://doi.org/10.1109/TNNLS.2013.2292894Google ScholarCross Ref
- Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Identity Mappings in Deep Residual Networks. arXiv:1603.05027 [cs] (July 2016). arxiv:1603.05027 [cs]Google Scholar
- Olivier J. Hénaff, Aravind Srinivas, Jeffrey De Fauw, Ali Razavi, Carl Doersch, S. M. Ali Eslami, and Aaron van den Oord. 2019. Data-Efficient Image Recognition with Contrastive Predictive Coding. arXiv:1905.09272 [cs] (Dec. 2019). arxiv:1905.09272 [cs]Google Scholar
- R. Devon Hjelm, Alex Fedorov, Samuel Lavoie-Marchildon, Karan Grewal, Phil Bachman, Adam Trischler, and Yoshua Bengio. 2019. Learning Deep Representations by Mutual Information Estimation and Maximization. arXiv:1808.06670 [cs, stat] (Feb. 2019). arxiv:1808.06670 [cs, stat]Google Scholar
- Longlong Jing and Yingli Tian. 2019. Self-Supervised Visual Feature Learning with Deep Neural Networks: A Survey. arXiv:1902.06162 [cs] (Feb. 2019). arxiv:1902.06162 [cs]Google Scholar
- Alex Krizhevsky. 2009. Learning Multiple Layers of Features from Tiny Images.Google Scholar
- Kuang-Huei Lee, Xiaodong He, Lei Zhang, and Linjun Yang. 2018. CleanNet: Transfer Learning for Scalable Image Classifier Training with Label Noise. arXiv:1711.07131 [cs] (March 2018). arxiv:1711.07131 [cs]Google Scholar
- Ziwei Liu, Ping Luo, Shi Qiu, Xiaogang Wang, and Xiaoou Tang. 2016. DeepFashion: Powering Robust Clothes Recognition and Retrieval with Rich Annotations. In Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR).Google ScholarCross Ref
- Vasileios Papapanagiotou, Christos Diou, and Anastasios Delopoulos. 2015. Improving Concept-Based Image Retrieval with Training Weights Computed from Tags. ACM Trans. Multimedia Comput. Commun. Appl. 12, 2, Article 32 (Nov. 2015), 22 pages. https://doi.org/10.1145/2790230Google Scholar
- Ioannis Sarafis, Christos Diou, Theodora Tsikrika, and Anastasios Delopoulos. 2014. Weighted SVM from clickthrough data for image retrieval. In 2014 IEEE International Conference on Image Processing (ICIP). IEEE, 3013–3017.Google ScholarCross Ref
- Chen Sun, Abhinav Shrivastava, Saurabh Singh, and Abhinav Gupta. 2017. Revisiting unreasonable effectiveness of data in deep learning era. In Proceedings of the IEEE international conference on computer vision. 843–852.Google ScholarCross Ref
- Daiki Tanaka, Daiki Ikami, Toshihiko Yamasaki, and Kiyoharu Aizawa. 2018. Joint Optimization Framework for Learning with Noisy Labels. arXiv:1803.11364 [cs, stat] (March 2018). arxiv:1803.11364 [cs, stat]Google Scholar
- Theodora Tsikrika, Christos Diou, Arjen P. de Vries, and Anastasios Delopoulos. 2009. Image Annotation Using Clickthrough Data. In Proceedings of the ACM International Conference on Image and Video Retrieval (Santorini, Fira, Greece) (CIVR ’09). Association for Computing Machinery, New York, NY, USA, Article 14, 8 pages. https://doi.org/10.1145/1646396.1646415Google ScholarDigital Library
- Theodora Tsikrika, Christos Diou, Arjen P de Vries, and Anastasios Delopoulos. 2011. Reliability and effectiveness of clickthrough data for automatic image annotation. Multimedia Tools and Applications 55, 1 (2011), 27–52.Google ScholarDigital Library
- Aaron van den Oord, Yazhe Li, and Oriol Vinyals. 2019. Representation Learning with Contrastive Predictive Coding. arXiv:1807.03748 [cs, stat] (Jan. 2019). arxiv:1807.03748 [cs, stat]Google Scholar
- Jesper E Van Engelen and Holger H Hoos. 2020. A survey on semi-supervised learning. Machine Learning 109, 2 (2020), 373–440.Google ScholarCross Ref
- Andreas Veit, Neil Alldrin, Gal Chechik, Ivan Krasin, Abhinav Gupta, and Serge Belongie. 2017. Learning From Noisy Large-Scale Datasets With Minimal Supervision. arXiv:1701.01619 [cs] (April 2017). arxiv:1701.01619 [cs]Google Scholar
- Kun Yi and Jianxin Wu. 2019. Probabilistic End-to-End Noise Correction for Learning with Noisy Labels. arXiv:1903.07788 [cs] (March 2019). arxiv:1903.07788 [cs]Google Scholar
- Combining PENCIL with AMDIM for image classification with noisy and sparsely labeled data
Recommendations
Improving Medical Image Classification in Noisy Labels Using only Self-supervised Pretraining
Data Engineering in Medical ImagingAbstractNoisy labels hurt deep learning-based supervised image classification performance as the models may overfit the noise and learn corrupted feature extractors. For natural image classification training with noisy labeled data, model initialization ...
Noisy multi-label semi-supervised dimensionality reduction
Highlights- A new semi-supervised and label noise-tolerant multi-label dimensionality reduction method.
AbstractNoisy labeled data represent a rich source of information that often are easily accessible and cheap to obtain, but label noise might also have many negative consequences if not accounted for. How to fully utilize noisy labels has been ...
Identifying noisy labels with a transductive semi-supervised leave-one-out filter
Highlights- Semi-supervised classifiers are susceptible to label noise.
- Our method (LGC_...
AbstractObtaining data with meaningful labels is often costly and error-prone. In this situation, semi-supervised learning (SSL) approaches are interesting, as they leverage assumptions about the unlabeled data to make up for the limited ...
Comments