Skip to main content
Log in

LocMix: local saliency-based data augmentation for image classification

  • Original Paper
  • Published:
Signal, Image and Video Processing Aims and scope Submit manuscript

Abstract

Data augmentation is a crucial strategy to tackle issues like inadequate model robustness and a significant generalization gap. It is proven to combat overfitting, elevate deep neural network performance, and enhance generalization, particularly when data are limited. In recent years, mixed sample data augmentation (MSDA), including variants like Mixup and CutMix, has gained significant attention. However, these methods sometimes confound the network with misleading signals, limiting their effectiveness. In this context, we propose LocMix, an MSDA that aims to generate new training samples by prioritizing local saliency feature information and employing statistical data mixing. We achieve this by concealing salient regions with random masks and efficiently combining images through the optimization of local saliency information using transportation methods. Prioritizing the local features within an image allows LocMix to capture image details with greater accuracy and comprehensiveness, thereby enhancing the model’s capacity to understand the target image. We conduct extensive validation of this approach on various challenging datasets. When applied to the training of the PreAct-ResNet18 model, our method yields notable improvements in accuracy. Specifically, on the CIFAR-10 dataset, we observe an impressive 1.71% accuracy enhancement. Similarly, on CIFAR-100, Tiny-ImageNet, ImageNet, and SVHN, we attain substantial accuracy improvements of 80.12%, 64.60%, 77.62%, and 97.12%, corresponding to improvements of 4.88%, 8.75%, 1.93%, and 0.57%, respectively. These experimental results plainly illustrate the effectiveness of our proposed method.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7

Similar content being viewed by others

Data Availability

Our data sets are sourced from publicly available repositories. To access them, please visit the respective official websites for downloads.

References

  1. Chandio, A., et al.: Precise single-stae detector. arXiv preprint arXiv:2210.04252 (2022)

  2. Khan, W., et al.: Introducing urdu digits dataset with demonstration of an efficient and robust noisy decoder-based pseudo example generator. Symmetry 14(10), 1976 (2022). https://doi.org/10.3390/sym14101976

    Article  ADS  CAS  Google Scholar 

  3. Roy, A.M., et al.: Wildect-yolo: an efficient and robust computer vision-based accurate object localization model for automated endangered wildlife detection. Eco. Inform. 75, 101919 (2023). https://doi.org/10.1016/j.ecoinf.2022.101919

    Article  Google Scholar 

  4. He, K., et al.: Mask r-cnn. In Proceedings of the IEEE international conference on computer vision(ICCV), 2961–2969 (2017). https://doi.org/10.48550/arXiv.1703.06870

  5. Liu, X., Deng, Z., Yang, Y.: Recent progress in semantic image segmentation. Artif. Intell. Rev. 52, 1089–1106 (2019). https://doi.org/10.1007/s10462-018-9641-3

    Article  Google Scholar 

  6. Baseri Saadi, S., et al.: Investigation of effectiveness of shuffled frog-leaping optimizer in training a convolution neural network. J. Healthc. Eng. (2022). https://doi.org/10.1155/2022/4703682

    Article  PubMed  PubMed Central  Google Scholar 

  7. Ranjbarzadeh, R., et al.: Me-ccnn: multi-encoded images and a cascade convolutional neural network for breast tumor segmentation and recognition. Artif. Intell. Rev. (2023). https://doi.org/10.1007/s10462-023-10426-2

    Article  Google Scholar 

  8. Ranjbarzadeh, R., et al.: Mrfe-cnn: multi-route feature extraction model for breast tumor segmentation in mammograms using a convolutional neural network. Ann. Oper. Res. (2022). https://doi.org/10.1007/s10479-022-04755-8

    Article  Google Scholar 

  9. Russakovsky, O., et al.: Imagenet large scale visual recognition challenge. Int. J. Comput. Vision 115, 211–252 (2015). https://doi.org/10.1007/s11263-015-0816-y

    Article  MathSciNet  Google Scholar 

  10. Shorten, C., Khoshgoftaar, T.M.: A survey on image data augmentation for deep learning. J. Big Data 6(1), 1–48 (2019). https://doi.org/10.1186/s40537-019-0197-0

    Article  Google Scholar 

  11. Goodfellow, I.J., Shlens, J., Szegedy, C.: Explaining and harnessing adversarial examples. arXiv preprint arXiv:2210.04252 (2014)

  12. Singh, A., et al.: Understanding eeg signals for subject-wise definition of armoni activities. arXiv preprint arXiv:2301.00948 (2023)

  13. Bayer, M., Kaufhold, M.A., Reuter, C.: A survey on data augmentation for text classification. ACM Comput. Surv. 55(7), 1–39 (2022). https://doi.org/10.1145/3544558

    Article  Google Scholar 

  14. Harris, E., et al.: Fmix: Enhancing mixed sample data augmentation. arXiv preprint arXiv:2002.12047 (2020)

  15. Zhang, H., et al.: Mixup: beyond empirical risk minimization. arXiv preprint https://doi.org/10.48550/arXiv.1710.09412 (2017)

  16. Yun, S., et al.: Cutmix: regularization strategy to train strong classifiers with localizable features. In Proceedings of the IEEE/CVF international conference on computer vision(ICCV), 6023–6032 (2019). arXiv:1905.04899

  17. Uddin, A.A.O.: Saliencymix: a saliency guided data augmentation strategy for better regularization. arXiv preprint arXiv:2006.01791 (2020)

  18. Kim, J.H., Choo, W., Song, H.O.: Puzzle mix: Exploiting saliency and local statistics for optimal mixup. In International conference on machine learning, 5275–5285. PMLR (2020). arXiv:2009.06962v2

  19. Simonyan, K., Vedaldi, A., Zisserman, A.: Deep inside convolutional networks: visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034 (2013)

  20. Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. Commun. ACM 60(6), 84–90 (2017). https://doi.org/10.1145/3065386

    Article  Google Scholar 

  21. Zhong, Z., et al.: Random erasing data augmentation. In Proceedings of the AAAI conference on artificial intelligence 34, 13001–13008 (2020). https://doi.org/10.1609/aaai.v34i07.7000

  22. Singh, K.K., et al.: Hide-and-seek: A data augmentation technique for weakly-supervised localization and beyond. arXiv preprint arXiv:1811.02545 (2018)

  23. Taylor, L., Nitschke, G.: Improving deep learning with generic data augmentation. In 2018 IEEE symposium series on computational intelligence (SSCI), 1542–1547. IEEE (2018). https://doi.org/10.1109/SSCI.2018.8628742

  24. Verma, V., et al.: Manifold mixup: Better representations by interpolating hidden states. In International conference on machine learning, 6438–6447. PMLR (2019). arXiv:1806.05236v7

  25. Yan, L., et al.: Lmix: regularization strategy for convolutional neural networks. SIViP 17(4), 1245–1253 (2023). https://doi.org/10.1007/s11760-022-02332-x

    Article  Google Scholar 

  26. Sundararajan, M., Taly, A., Yan, Q.: Axiomatic attribution for deep networks. In International conference on machine learning, 3319–3328. PMLR (2017)

  27. Zhao, R., et al.: Saliency detection by multi-context deep learning. In Proceedings of the IEEE conference on computer vision and pattern recognition(CVPR), 1265–1274 (2015)

  28. Zhou, B., et al.: Learning deep features for discriminative localization. In Proceedings of the IEEE conference on computer vision and pattern recognition(CVPR), 2921–2929 (2016)

  29. Selvaraju, R.R., et al.: Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision(CVPR, 618–626 (2017)

  30. Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009)

  31. Le, Y., Yang, X.: Tiny imagenet visual recognition challenge. CS 231N, 7(7), 3 (2015). http://cs231n.stanford.edu/tiny-imagenet-200

  32. Netzer, Y., et al.: Reading digits in natural images with unsupervised feature learning (2011)

  33. He, K., et al.: Identity mappings in deep residual networks. In Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part IV 14, 630–645. Springer (2016). https://doi.org/10.1007/978-3-319-46493-0_38

  34. Zagoruyko, S., Komodakis, N.: Wide residual networks. arXiv preprint arXiv:1605.07146 (2016)

  35. Han, D., Kim, J., Kim, J.: Deep pyramidal residual networks. In Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), 5927–5935 (2017). arxiv:1610.02915

  36. Kim, J.H., et al.: Co-mixup: Saliency guided joint mixup with supermodular diversity. Cornell University-arXiv, Learning (2021)

Download references

Funding

This work is funded by the National Natural Science Foundation of China under Grant No. 61772180 and the Key R & D plan of Hubei Province No. 2023BCB041.

Author information

Authors and Affiliations

Authors

Contributions

LY and YY performed the main manuscript work and experiments. WC and SY created Tables 3 and 4. All authors participated in manuscript review.

Corresponding author

Correspondence to Yu Ye.

Ethics declarations

Conflict of interest

The authors declare no competing interests.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Yan, L., Ye, Y., Wang, C. et al. LocMix: local saliency-based data augmentation for image classification. SIViP 18, 1383–1392 (2024). https://doi.org/10.1007/s11760-023-02852-0

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11760-023-02852-0

Keywords

Navigation