Abstract
Fine-grained visual classification (FGVC) necessitates expert knowledge,which is expensive and requires a large training sample size. Consequently, using sample data acquired through the web has emerged as a novel approach for augmenting training samples. However, the web data often includes noisy samples, leading to misclassification of deep learning models. This paper presents a a meta-learning-base method called Data Reweighting Net (DR-Net). It enables the use of small, clean meta set as a guiding mechanism to accurately learn web image datasets that contain noise. More specifically, the DR-Net fully learns from small, clean meta set to discard noisy samples and obtain clean web samples through low similarity properties. DR-Net enables classification networks to adaptively learn training sets through sample weighting, mitigating the impact of noisy labels on classification learning. Our experiments on Web-bird, Web-aircraft, Web-car, CIFAR-10, and CIFAR-100 datasets demonstrate the feasibility of our proposed method.






Similar content being viewed by others
References
Balaha MM, El-Kady S, Balaha HM, Salama M, Emad E, Hassan M, Saafan MM (2023) A vision-based deep learning approach for independent-users arabic sign language interpretation. Multim Tools Appl 82(5):6807–6826. https://doi.org/10.1007/S11042-022-13423-9
Ahmed U, Lin JC, Srivastava G (2022) Mitigating adversarial evasion attacks by deep active learning for medical image classification. Multim Tools Appl 81(29):41899–41910. https://doi.org/10.1007/S11042-021-11473-Z
Sharma A, Mishra PK (2022) Image enhancement techniques on deep learning approaches for automated diagnosis of COVID-19 features using CXR images. Multim Tools Appl 81(29):42649–42690. https://doi.org/10.1007/S11042-022-13486-8
Raghavan R, Verma DC, Pandey D, Anand R, Pandey BK, Singh H (2022) Optimized building extraction from high-resolution satellite imagery using deep learning. Multim Tools Appl 81(29):42309–42323. https://doi.org/10.1007/S11042-022-13493-9
Yadavendra Chand S (2022) Semantic segmentation and detection of satellite objects using u-net model of deep learning. Multim Tools Appl 81(30):44291–44310. https://doi.org/10.1007/S11042-022-12892-2
Yao Y, Shen F, Zhang J, Liu L, Tang Z, Shao L (2019) Extracting privileged information for enhancing classifier learning. IEEE Trans Image Process 28(1):436–450. https://doi.org/10.1109/TIP.2018.2869721
Yao Y, Shen F, Zhang J, Liu L, Tang Z, Shao L (2019) Extracting multiple visual senses for web learning. IEEE Trans. Multim. 21(1):184–196. https://doi.org/10.1109/TMM.2018.2847248
Xie G-S, Liu L, Jin X, Zhu F, Zhang Z, Qin J, Yao Y, Shao L (2019) Attentive region embedding network for zero-shot learning. In: 2019 IEEE/CVF Conference on computer vision and pattern recognition (CVPR), pp 9376–9385. https://doi.org/10.1109/CVPR.2019.00961
Luo H, Lin G, Liu Z, Liu F, Tang Z, Yao Y (2019) Segeqa: video segmentation based visual attention for embodied question answering. In: 2019 IEEE/CVF International conference on computer vision (ICCV), pp 9666–9675 . https://doi.org/10.1109/ICCV.2019.00976
Xie G-S, Liu L, Zhu F, Zhao F, Zhang Z, Yao Y, Qin J, Shao L (2020) Region graph embedding network for zero-shot learning. In: Computer vision–ECCV 2020: 16th European conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part IV 16, pp 562–580 . Springer
Yao Y, Hua X, Gao G, Sun Z, Li Z, Zhang J (2020) Bridging the web data and fine-grained visual recognition via alleviating label noise and domain mismatch. In: Proceedings of the 28th ACM international conference on multimedia. MM ’20, pp 1735–1744. Association for Computing Machinery, New York, USA. https://doi.org/10.1145/3394171.3413851
Sun Z, Shen F, Huang D, Wang Q, Shu X, Yao Y, Tang J (2022) Pnp: robust learning from noisy labels by probabilistic noise prediction. In: 2022 IEEE/CVF Conference on computer vision and pattern recognition (CVPR), pp 5301–5310. https://doi.org/10.1109/CVPR52688.2022.00524
Shu X, Tang J, Li Z, Lai H, Zhang L, Yan S (2018) Personalized age progression with bi-level aging dictionary learning. IEEE Trans Pattern Anal Mach Intell 40(4):905–917. https://doi.org/10.1109/TPAMI.2017.2705122
Shu X, Tang J, Qi G, Liu W, Yang J (2021) Hierarchical long short-term concurrent memory for human interaction recognition. IEEE Trans Pattern Anal Mach Intell 43(3):1110–1118. https://doi.org/10.1109/TPAMI.2019.2942030
Nie L, Yan S, Wang M, Hong R, Chua T-S (2012) Harvesting visual concepts for image search with complex queries. In: Proceedings of the 20th ACM international conference on multimedia. MM ’12, pp 59–68. Association for Computing Machinery, New York, USA. https://doi.org/10.1145/2393347.2393363
Nie L, Wang M, Zhang L, Yan S, Zhang B, Chua T (2015) Disease inference from health-related questions via sparse deep learning. IEEE Trans Knowl Data Eng 27(8):2107–2119. https://doi.org/10.1109/TKDE.2015.2399298
Yao Y, Chen T, Xie G-S, Zhang C, Shen F, Wu Q, Tang Z, Zhang J (2021) Non-salient region object mining for weakly supervised semantic segmentation. In: 2021 IEEE/CVF conference on computer vision and pattern recognition (CVPR), pp 2623–2632. https://doi.org/10.1109/CVPR46437.2021.00265
Nie L, Zhao Y, Akbari M, Shen J, Chua T (2015) Bridging the vocabulary gap between health seekers and healthcare knowledge. IEEE Trans Knowl Data Eng 27(2):396–409. https://doi.org/10.1109/TKDE.2014.2330813
Wah C, Branson S, Welinder P, Perona P, Belongie S (2011) The caltech-ucsd birds-200-2011 dataset
Krause J, Stark M, Deng J, Fei-Fei L (2013) 3d object representations for fine-grained categorization. In: 2013 IEEE International conference on computer vision workshops, pp 554–561. https://doi.org/10.1109/ICCVW.2013.77
Maji S, Rahtu E, Kannala J, Blaschko M, Vedaldi A (2013) Fine-grained visual classification of aircraft. arXiv:1306.5151
Yao Y, Zhang J, Shen F, Hua X, Xu J, Tang Z (2017) Exploiting web images for dataset construction: A domain robust approach. IEEE Trans Multim 19(8):1771–1784. https://doi.org/10.1109/TMM.2017.2684626
Yao Y, Zhang J, Shen F, Liu L, Zhu F, Zhang D, Shen HT (2020) Towards automatic construction of diverse, high-quality image datasets. IEEE Trans Knowl Data Eng 32(6):1199–1211. https://doi.org/10.1109/TKDE.2019.2903036
Yao Y, Hua X-s, Shen F, Zhang J, Tang Z (2016) A domain robust approach for image dataset construction. In: Proceedings of the 24th ACM international conference on multimedia. MM ’16, pp 212–216. Association for Computing Machinery, New York, USA. https://doi.org/10.1145/2964284.2967213
Zhang C, Yao Y, Liu H, Xie G-S, Shu X, Zhou T, Zhang Z, Shen F, Tang Z (2020) Web-supervised network with softly update-drop training for fine-grained visual classification. Proceedings of the AAAI Conference on Artificial Intelligence 34(07):12781–12788. https://doi.org/10.1609/aaai.v34i07.6973
Sun Z, Hua X-S, Yao Y, Wei X-S, Hu G, Zhang J (2020) Crssc: salvage reusable samples from noisy data for robust learning. In: Proceedings of the 28th ACM international conference on multimedia. MM ’20, pp 92–101. Association for Computing Machinery, New York, USA. https://doi.org/10.1145/3394171.3413978
Sun Z, Yao Y, Wei X-S, Zhang Y, Shen F, Wu J, Zhang J, Shen HT (2021) Webly supervised fine-grained recognition: benchmark datasets and an approach. In: Proceedings of the IEEE/CVF international conference on computer vision (ICCV), pp 10602–10611
Arpit D, Jastrzębski S, Ballas N, Krueger D, Bengio E, Kanwal MS, Maharaj T, Fischer A, Courville A, Bengio Y, et al. (2017) A closer look at memorization in deep networks. In: International conference on machine learning, pp 233–242. PMLR
Zhang C, Bengio S, Hardt M, Recht B, Vinyals O (2021) Understanding deep learning (still) requires rethinking generalization. Commun ACM 64(3):107–115. https://doi.org/10.1145/3446776
Zhang W, Wang D, Tan X (2019) Robust class-specific autoencoder for data cleaning and classification in the presence of label noise. Neural Process Lett 50(2):1845–1860. https://doi.org/10.1007/s11063-018-9963-9
Shu J, Xie Q, Yi L, Zhao Q, Zhou S, Xu Z, Meng D (2019) Meta-weight-net: learning an explicit mapping for sample weighting. Adv Neural Inform Process Syst 32
Ronneberger O, Fischer P, Brox T (2015) U-net: convolutional networks for biomedical image segmentation. In: Navab N, Hornegger J, Wells WM, Frangi AF (eds) Medical image computing and computer-assisted intervention - MICCAI 2015. Springer, Cham, pp 234–241
Zhang N, Donahue J, Girshick R, Darrell T (2014) Part-based r-cnns for fine-grained category detection. In: Fleet D, Pajdla T, Schiele B, Tuytelaars T (eds) Computer Vision - ECCV 2014. Springer, Cham, pp 834–849
Wei X, Xie C, Wu J (2016) Mask-cnn: localizing parts and selecting descriptors for fine-grained image recognition. arXiv:1605.06878
Lin D, Shen X, Lu C, Jia J (2015) Deep lac: deep localization, alignment and classification for fine-grained recognition. In: 2015 IEEE Conference on computer vision and pattern recognition (CVPR), pp 1666–1674. https://doi.org/10.1109/CVPR.2015.7298775
Nie X, Chai B, Wang L, Liao Q, Xu M (2023) Learning enhanced features and inferring twice for fine-grained image classification. Multim Tools Appl 82(10):14799–14813. https://doi.org/10.1007/s11042-022-13619-z
Huang S, Xu Z, Tao D, Zhang Y (2016) Part-stacked cnn for fine-grained visual categorization. In: 2016 IEEE Conference on computer vision and pattern recognition (CVPR), pp 1173–1182. https://doi.org/10.1109/CVPR.2016.132
Du R, Chang D, Bhunia AK, Xie J, Ma Z, Song Y-Z, Guo J (2020) Fine-grained visual classification via progressive multi-granularity training of jigsaw patches. In: Vedaldi A, Bischof H, Brox T, Frahm J-M (eds) Computer Vision - ECCV 2020. Springer, Cham, pp 153–168
Wu Z, Chen Q, Liu Y, Zhang Y, Zhu C, Yu Y (2021) Progressive multi-stage interactive training in mobile network for fine-grained recognition. arXiv:2112.04223
Yang L, Li X, Song R, Zhao B, Tao J, Zhou S, Liang J, Yang J (2022) Dynamic mlp for fine-grained image classification by leveraging geographical and temporal information. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 10945–10954
Wang Q, Wang J, Quan X, Feng F, Xu Z, Nie S, Wang S, Khabsa M, Firooz H, Liu D (2023) Mustie: multimodal structural transformer for web information extraction. In: Proceedings of the 61st annual meeting of the association for computational linguistics (vol 1: Long Papers), pp 2405–2420
Wang Q, Fang Y, Ravula A, Feng F, Quan X, Liu D (2022) Webformer: the web-page transformer for structure information extraction. In: Proceedings of the ACM Web conference 2022. WWW ’22, pp 3124–3133. Association for Computing Machinery, New York, USA. https://doi.org/10.1145/3485447.3512032
Yang L, Wang Q, Wang J, Quan X, Feng F, Chen Y, Khabsa M, Wang S, Xu Z, Liu D (2023) Mixpave: mix-prompt tuning for few-shot product attribute value extraction. Findings of the association for computational linguistics: ACL 2023:9978–9991
Krause J, Sapp B, Howard A, Zhou H, Toshev A, Duerig T, Philbin J, Fei-Fei L (2016) The unreasonable effectiveness of noisy data for fine-grained recognition. In: Leibe B, Matas J, Sebe N, Welling M (eds) Computer Vision - ECCV 2016. Springer, Cham, pp 301–320
Han B, Yao Q, Yu X, Niu G, Xu M, Hu W, Tsang IW, Sugiyama M (2018) Co-teaching: robust training of deep neural networks with extremely noisy labels. In: Bengio S, Wallach HM, Larochelle H, Grauman K, Cesa-Bianchi N, Garnett R (eds) Advances in neural information processing systems 31: annual conference on neural information processing systems 2018, NeurIPS 2018, December 3-8, 2018, Montréal, Canada, pp 8536–8546. https://proceedings.neurips.cc/paper/2018/hash/a19744e268754fb0148b01764 7355b7b-Abstract.html
Yu X, Han B, Yao J, Niu G, Tsang I, Sugiyama M (2019) How does disagreement help generalization against label corruption? In: International conference on machine learning, pp 7164–7173. PMLR
Liu D, Cui Y, Yan L, Mousas C, Yang B, Chen Y (2022) Densernet: weakly supervised visual localization using multi-scale feature aggregation. Proceedings of the AAAI conference on artificial intelligence 6101–6109. https://doi.org/10.1609/aaai.v35i7.16760
Liu D, Liang J, Geng T, Loui A, Zhou T (2023) Tripartite feature enhanced pyramid network for dense prediction. IEEE Trans Image Process 32:2678–2692. https://doi.org/10.1109/TIP.2023.3272826
Liu D, Cui Y, Tan W, Chen Y (2021) Sg-net: spatial granularity network for one-stage video instance segmentation. In: 2021 IEEE/CVF conference on computer vision and pattern recognition (CVPR). https://doi.org/10.1109/cvpr46437.2021.00969
Cui Y, Yan L, Cao Z, Liu D (2021) Tf-blender: temporal feature blender for video object detection. Cornell University - arXiv, Cornell University - arXiv
Wang W, Liang J, Liu D (2022) Learning equivariant segmentation with instance-unique querying
Shu J, Yuan X, Meng D, Xu Z (2022) Cmw-net: learning a class-aware sample weighting mapping for robust deep learning. CoRR arXiv:2202.05613
Chawla NV, Bowyer KW, Hall LO, Kegelmeyer WP (2002) SMOTE: synthetic minority over-sampling technique. J Artif Intell Res 16:321–357. https://doi.org/10.1613/jair.953
Dong Q, Gong S, Zhu X (2017) Class rectification hard mining for imbalanced deep learning. In: Proceedings of the IEEE International conference on computer vision, pp 1851–1860
Zadrozny B (2004) Learning and evaluating classifiers under sample selection bias. In: Proceedings of the twenty-first international conference on machine learning, p 114
He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 770–778
Yue C, Huang R, Towey D, Xian Z, Wu G (2024) An entropy-based group decision-making approach for software quality evaluation. Expert Syst Appl 238:121979. https://doi.org/10.1016/j.eswa.2023.121979
Dubey A, Gupta O, Guo P, Raskar R, Farrell R, Naik N (2018) Pairwise confusion for fine-grained visual classification. In: Proceedings of the European conference on computer vision (ECCV), pp 70–86
Yang Z, Luo T, Wang D, Hu Z, Gao J, Wang L (2018) Learning to navigate for fine-grained classification. In: Proceedings of the European conference on computer vision (ECCV), pp 420–435
Wang Y, Morariu VI, Davis LS (2018) Learning a discriminative filter bank within a cnn for fine-grained recognition. In: 2018 IEEE/CVF Conference on computer vision and pattern recognition, pp 4148–4157. https://doi.org/10.1109/CVPR.2018.00436
Song K, Wei X, Shu X, Song R, Lu J (2020) Bi-modal progressive mask attention for fine-grained recognition. IEEE Trans Image Process 29:7006–7018. https://doi.org/10.1109/TIP.2020.2996736
Li J, Zhu L, Huang Z, Lu K, Zhao J (2018) I read, i saw, i tell: texts assisted fine-grained visual classification. In: Proceedings of the 26th ACM international conference on multimedia. MM ’18, pp 663–671. Association for Computing Machinery, New York, USA. https://doi.org/10.1145/3240508.3240579
Wang Y, Choi J, Morariu VI, Davis LS (2016) Mining discriminative triplets of patches for fine-grained classification. In: 2016 IEEE Conference on computer vision and pattern recognition (CVPR), pp 1163–1172. https://doi.org/10.1109/CVPR.2016.131
Wei X, Xie C, Wu J, Shen C (2018) Mask-cnn: localizing parts and selecting descriptors for fine-grained bird species categorization. Pattern Recognit 76:704–714. https://doi.org/10.1016/j.patcog.2017.10.002
Zhang C, Lin G, Wang Q, Shen F, Yao Y, Tang Z (2022) Guided by meta-set: a data-driven method for fine-grained visual recognition. IEEE Transactions on Multimedia
Zhang Z, Liu Q, Wang Y (2018) Road extraction by deep residual u-net. IEEE Geosci Remote Sensing Lett 15(5):749–753
Fan T, Wang G, Li Y, Wang H (2020) Ma-net: a multi-scale attention network for liver and tumor segmentation. IEEE Access 8:179656–179665
Chaurasia A, Culurciello E (2017) Linknet: exploiting encoder representations for efficient semantic segmentation. In: 2017 IEEE Visual communications and image processing (VCIP), pp 1–4. IEEE
Kirillov A, He K, Girshick R, Dollár P (2017) A unified architecture for instance and semantic segmentation. In: CVPR
Li H, Xiong P, An J, Wang L (2018) Pyramid attention network for semantic segmentation. arXiv:1805.10180
Chen L-C, Zhu Y, Papandreou G, Schroff F, Adam H (2018) Encoder-decoder with atrous separable convolution for semantic image segmentation. In: Proceedings of the European conference on computer vision (ECCV), pp 801–818
Wah C, Branson S, Welinder P, Perona P, Belongie S (2011) The caltech-ucsd birds-200-2011 dataset
Maji S, Rahtu E, Kannala J, Blaschko MB, Vedaldi A (2013) Fine-grained visual classification of aircraft. arXiv:1306.5151
Paszke A, Gross S, Chintala S, Chanan G, Yang E, DeVito Z, Lin Z, Desmaison A, Antiga L, Lerer A (2017) Automatic differentiation in pytorch
Russakovsky O, Deng J, Su H, Krause J, Satheesh S, Ma S, Huang Z, Karpathy A, Khosla A, Bernstein MS, Berg AC, Fei-Fei L (2015) Imagenet large scale visual recognition challenge. Int J Comput Vis 115(3):211–252. https://doi.org/10.1007/s11263-015-0816-y
Zhang H, Cisse M, Dauphin YN, Lopez-Paz D (2017) mixup: beyond empirical risk minimization. arXiv:1710.09412
Krizhevsky A, Hinton G et al (2009) Learning multiple layers of features from tiny images
Patrini G, Rozza A, Krishna Menon A, Nock R, Qu L (2017) Making deep neural networks robust to label noise: a loss correction approach. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1944–1952
Zhang Z, Sabuncu M (2018) Generalized cross entropy loss for training deep neural networks with noisy labels. Adv Neural Inform Process Syst 31
Ren M, Zeng W, Yang B, Urtasun R (2018) Learning to reweight examples for robust deep learning. In: International conference on machine learning, pp 4334–4343. PMLR
Funding
This research was funded by the Macau Science and Technology Development Funds [Grant number 0061/2020/A2].
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of Interest
The authors declare that they have no conflict of interest.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Liu, Y., Wu, Z., Lo, Sl. et al. Data reweighting net for web fine-grained image classification. Multimed Tools Appl 83, 79985–80005 (2024). https://doi.org/10.1007/s11042-024-18598-x
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-024-18598-x