Skip to main content

SFRSwin: A Shallow Significant Feature Retention Swin Transformer for Fine-Grained Image Classification of Wildlife Species

  • Conference paper
  • First Online:
Pattern Recognition and Computer Vision (PRCV 2023)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 14433))

Included in the following conference series:

  • 429 Accesses

Abstract

Fine-grained image classification of wildlife species is a task of practical value and has an important role to play in the fields of endangered animal conservation, environmental protection and ecological conservation. However, the small differences between different subclasses of wildlife and the large differences within the same subclasses pose a great challenge to the classification of wildlife species. In addition, the feature extraction capability of existing methods is insufficient, ignoring the role of shallow effective features and failing to identify subtle differences between images well. To solve the above problems, this paper proposes an improved Swin Transformer architecture, called SFRSwin. Specifically, a shallow feature retention mechanism is proposed, where the mechanism consists of a branch that extracts significant features from shallow features, is used to retain important features in the shallow layers of the image, and forms a dual-stream structure with the original network. SFRSwin was trained and tested on the communal dataset Stanford Dogs and the small-scale dataset Shark species, and achieved an accuracy of 93.8\(\%\) and 84.3\(\%\) on the validation set, an improvement of 0.1\(\%\) and 0.3\(\%\) respectively over the pre-improvement period. In terms of complexity, the FLOPs only increased by 2.7\(\%\) and the number of parameters only increased by 0.15\(\%\).

This work was supported in part by the National Natural Science Foundation of China (NSFC) under Grant 62272256, the Shandong Provincial Natural Science Foundation under Grants ZR2021MF026 and ZR2023MF040, the Innovation Capability Enhancement Program for Small and Medium-sized Technological Enterprises of Shandong Province under Grants 2022TSGC2180 and 2022TSGC2123, the Innovation Team Cultivating Program of Jinan under Grant 202228093, and the Piloting Fundamental Research Program for the Integration of Scientific Research, Education and Industry of Qilu University of Technology (Shandong Academy of Sciences) under Grants 2021JC02014 and 2022XD001, the Talent Cultivation Promotion Program of Computer Science and Technology in Qilu University of Technology (Shandong Academy of Sciences) under Grants 2021PY05001 and 2023PY059.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 59.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 79.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Boureau, Y.L., Bach, F., LeCun, Y., Ponce, J.: Learning mid-level features for recognition. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2559–2566. IEEE (2010)

    Google Scholar 

  2. Dosovitskiy, A., et al.: An image is worth 16x16 words: transformers for image recognition at scale. In: International Conference on Learning Representations (2021)

    Google Scholar 

  3. Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T.: Computer vision–ECCV 2014–13th European Conference, Zurich, Switzerland, 6–12 September 2014, Proceedings, Part III. Lecture Notes in Computer Science, vol. 8694. Springer, Cham (2014)

    Google Scholar 

  4. Fu, J., Zheng, H., Tao, M.: Look closer to see better: recurrent attention convolutional neural network for fine-grained image recognition. In: IEEE Conference on Computer Vision & Pattern Recognition (2017)

    Google Scholar 

  5. Han, K., et al.: A survey on vision transformer. IEEE Trans. Pattern Anal. Mach. Intell. 45(1), 87–110 (2023). https://doi.org/10.1109/TPAMI.2022.3152247

    Article  Google Scholar 

  6. Hodgson, J.C., Baylis, S.M., Mott, R., Herrod, A., Clarke, R.H.: Precision wildlife monitoring using unmanned aerial vehicles. Sci. Rep. 6(1), 1–7 (2016)

    Article  Google Scholar 

  7. Lin, T.Y., RoyChowdhury, A., Maji, S.: Bilinear CNN models for fine-grained visual recognition. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 1449–1457 (2015)

    Google Scholar 

  8. Liu, H., et al.: TransiFC: invariant cues-aware feature concentration learning for efficient fine-grained bird image classification. In: IEEE Transactions on Multimedia, pp. 1–14 (2023). https://doi.org/10.1109/TMM.2023.3238548

  9. Liu, Z., et al.: Swin Transformer V2: scaling up capacity and resolution. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 12009–12019 (2022)

    Google Scholar 

  10. Liu, Z., et al.: Swin Transformer: hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021)

    Google Scholar 

  11. Qiu, C., Zhou, W.: A survey of recent advances in CNN-based fine-grained visual categorization. In: 2020 IEEE 20th International Conference on Communication Technology (ICCT), pp. 1377–1384. IEEE (2020)

    Google Scholar 

  12. Shen, Z., Mu, L., Gao, J., Shi, Y., Liu, Z.: Review of fine-grained image categorization. J. Comput. Appl. 43(1), 51 (2023)

    Google Scholar 

  13. Su, T., Ye, S., Song, C., Cheng, J.: Mask-Vit: an object mask embedding in vision transformer for fine-grained visual classification. In: 2022 IEEE International Conference on Image Processing (ICIP), pp. 1626–1630. IEEE (2022)

    Google Scholar 

  14. Vaswani, A., et al.: Attention is all you need. In: Advances in Neural Information Processing Systems, vol. 30 (2017)

    Google Scholar 

  15. Wu, Z., et al.: Deep learning enables satellite-based monitoring of large populations of terrestrial mammals across heterogeneous landscape. Nat. Commun. 14(1), 3072 (2023)

    Article  Google Scholar 

  16. Zheng, M., et al.: A survey of fine-grained image categorization. In: 2018 14th IEEE International Conference on Signal Processing (ICSP), pp. 533–538 (2018). https://doi.org/10.1109/ICSP.2018.8652307

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yubing Han .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2024 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Wang, S. et al. (2024). SFRSwin: A Shallow Significant Feature Retention Swin Transformer for Fine-Grained Image Classification of Wildlife Species. In: Liu, Q., et al. Pattern Recognition and Computer Vision. PRCV 2023. Lecture Notes in Computer Science, vol 14433. Springer, Singapore. https://doi.org/10.1007/978-981-99-8546-3_19

Download citation

  • DOI: https://doi.org/10.1007/978-981-99-8546-3_19

  • Published:

  • Publisher Name: Springer, Singapore

  • Print ISBN: 978-981-99-8545-6

  • Online ISBN: 978-981-99-8546-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics