Abstract
Large-scale datasets for single-label multi-class classification, such as ImageNet-1k, have been instrumental in advancing deep learning and computer vision. However, a critical and often understudied aspect is the comprehensive quality assessment of these datasets, especially regarding potential multi-label annotation errors. In this paper, we introduce a lightweight, user-friendly, and scalable framework that synergizes human and machine intelligence for efficient dataset validation and quality enhancement. We term this novel framework Multilabelfy. Central to Multilabelfy is an adaptable web-based platform that systematically guides annotators through the re-evaluation process, effectively leveraging human-machine interactions to enhance dataset quality. By using Multilabelfy on the ImageNetV2 dataset, we found that approximately \(47.88\%\) of the images contained at least two labels, underscoring the need for more rigorous assessments of such influential datasets. Furthermore, our analysis showed a negative correlation between the number of potential labels per image and model top-1 accuracy, illuminating a crucial factor in model evaluation and selection. Our open-source framework, Multilabelfy, offers a convenient, lightweight solution for dataset enhancement, emphasizing multi-label proportions. This study tackles major challenges in dataset integrity and provides key insights into model performance evaluation. Moreover, it underscores the advantages of integrating human expertise with machine capabilities to produce more robust models and trustworthy data development.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Krizhevsky, A., Sutskever, I., Hinton, G.: ImageNet classification with deep convolutional neural networks. Commun. ACM 60(6), 84–90 (2017). https://doi.org/10.1145/3065386
Wightman, R.: Pytorch image models. GitHub (2019). https://github.com/huggingface/pytorch-image-models/blob/main/results/results-imagenet.csv
Ozbulak, U., et al.: Know your self-supervised learning: a survey on image-based generative and discriminative training. Trans. Mach. Learn. Res. (2023). https://openreview.net/forum?id=Ma25S4ludQ
Beyer, L., Hénaff, O., Kolesnikov, A., Zhai, X., Oord, A.: Are we done with ImageNet? arXiv preprint (2020). http://arxiv.org/abs/2006.07159
Tsipras, D., Santurkar, S., Engstrom, L., Ilyas, A., Madry, A.: From ImageNet to image classification: contextualizing progress on benchmarks. In: 37th International Conference on Machine Learning, Article no. 896, pp. 9625–9635 (2020). https://dl.acm.org/doi/10.5555/3524938.3525830
Vasudevan, V., Caine, B., Gontijo-Lopes, R., Fridovich-Keil, S., Roelofs, R.: When does dough become a bagel? Analyzing the remaining mistakes on ImageNet. In: NeurIPS (2022). https://openreview.net/pdf?id=mowt1WNhTC7
Recht, B., Roelofs, R., Schmidt, L., Shankar, V.: Do ImageNet classifiers generalize to ImageNet? In: 36th International Conference on Machine Learning (2019). http://proceedings.mlr.press/v97/recht19a/recht19a.pdf
Engstrom, L., Ilyas, A., Santurkar, S., Tsipras, D., Steinhardt, J., Madry, A.: Identifying statistical bias in dataset replication. In: 37th International Conference on Machine Learning (2020). http://proceedings.mlr.press/v119/engstrom20a/engstrom20a.pdf
Anzaku, E., Wang, H., Van Messem, A., De Neve, W.: A principled evaluation protocol for comparative investigation of the effectiveness of DNN classification models on similar-but-non-identical datasets. arXiv preprint (2022). http://arxiv.org/abs/2209.01848
Shankar, V., Roelofs, R., Mania, H., Fang, A., Recht, B., Schmidt, L.: Evaluating machine accuracy on ImageNet. In: 37th International Conference on Machine Learning, vol. 119, pp. 8634–8644 (2020). https://proceedings.mlr.press/v119/shankar20c.html
Northcutt, C., Athalye, A., Mueller, J.: Pervasive label errors in test sets destabilize machine learning benchmarks. In: Proceedings of the Neural Information Processing Systems Track on Datasets and Benchmarks (2021). https://openreview.net/pdf?id=XccDXrDNLek
Luccioni, A., Rolnick, D.: Bugs in the data: how imagenet misrepresents biodiversity. In: Proceedings of the Thirty-Fifth Conference on Innovative Applications of Artificial Intelligence and Thirteenth Symposium on Educational Advances in Artificial Intelligence, Article no. 1613, pp. 14382–14390 (2023). https://dl.acm.org/doi/10.1609/aaai.v37i12.26682
Fang, Y., Sun, Q., Wang, X., Huang, T., Wang, X., Cao, Y.: Eva-02: a visual representation for neon genesis. arXiv preprint (2023). https://doi.org/10.48550/arXiv.2303.11331
Acknowledgment
This research was supported by Ghent University Global Campus (GUGC) in Korea. This research was also supported under the National Research Foundation of Korea (NRF), (2020K1A3A1A68093469), funded by the Korean Ministry of Science and ICT (MSIT). We want to specifically thank the following people for their contribution to the annotation process: Gayoung Lee, Gyubin Lee, Herim Lee, Hyesoo Hong, Jihyung Yoo, Jin-Woo Park, Kangmin Kim, Jihyung Yoo, Jongbum Won, Sohee Lee, Sohn Yerim, Taeyoung Choi, Younghyun Kim, Yujin Cho, and Wonjun Yang.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2024 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Anzaku, E.T. et al. (2024). Leveraging Human-Machine Interactions for Computer Vision Dataset Quality Enhancement. In: Choi, B.J., Singh, D., Tiwary, U.S., Chung, WY. (eds) Intelligent Human Computer Interaction. IHCI 2023. Lecture Notes in Computer Science, vol 14531. Springer, Cham. https://doi.org/10.1007/978-3-031-53827-8_27
Download citation
DOI: https://doi.org/10.1007/978-3-031-53827-8_27
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-53826-1
Online ISBN: 978-3-031-53827-8
eBook Packages: Computer ScienceComputer Science (R0)