Skip to main content

Advertisement

Log in

A scalable two-stage model for real-time Wetland bird recognition

  • Published:
The Journal of Supercomputing Aims and scope Submit manuscript

Abstract

Traditional efficient lightweight image classification algorithms generally demonstrate low accuracy in real-time wetland bird recognition tasks due to the environmental complexity and the high similarity among bird species. Moreover, a bird recognition server needs to perform computation-intensive tasks of multi-process parallel inferences, requiring a low inference latency of the bird recognition algorithm. Traditional high-accuracy fine-grained methods cannot meet the demands due to their high computational complexity. In this study, we introduce a scalable two-stage model for real-time wetland bird recognition, which incorporates an object detector and a fine-grained image recognition technique, bilinear pooling, to encode fine-grained features. Additionally, we design a lightweight architecture and propose a bilinear scalable module in the bilinear pooling to trade-off between latency and accuracy. The experimental results show that the proposed method achieves 77.6% and 97.6% accuracy on the CUB and WPB datasets, respectively, which are much higher than MobileNetV3 and ShuffleNetV2, with a low inference latency of only 79.5 ms on CPU. Furthermore, parallel inference experiments in practical environments demonstrate that the proposed method achieves an inference speed of 15.3 FPS, with 12 parallel video streams.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

Similar content being viewed by others

Data availability

Data associated with this work can be availed from the corresponding author upon formal request.

References

  1. Hu S, Niu Z, Chen Y, Li L, Zhang H (2017) Global wetlands: potential distribution, wetland loss, and status. Sci Total Environ 586:319–327

    Article  MATH  Google Scholar 

  2. Xu T, Weng B, Yan D, Wang K, Li X, Bi W, Li M, Cheng X, Liu Y (2019) Wetlands of international importance: Status, threats, and future protection. Int J Environ Res Public Health 16:1818

    Article  MATH  Google Scholar 

  3. Kati VI, Sekercioglu CH (2006) Diversity, ecological structure, and conservation of the landbird community of dadia reserve, greece. Divers Distrib 12:620–629

    Article  MATH  Google Scholar 

  4. Wang S, Loreau M (2014) Ecosystem stability in space: \(\alpha \), \(\beta \) and \(\gamma \) variability. Ecol Lett 17:891–901

    Article  MATH  Google Scholar 

  5. Brambilla M, Rizzolli F, Franzoi A, Caldonazzi M, Zanghellini S, Pedrini P (2020) A network of small protected areas favoured generalist but not specialized wetland birds in a 30-year period. Biol Conserv 248:108699

    Article  Google Scholar 

  6. Mitsch WJ, Bernal B, Nahlik AM, Mander Ü, Zhang L, Anderson CJ, Jørgensen SE, Brix H (2013) Wetlands, carbon, and climate change. Landsc Ecol 28:583–597

    Article  Google Scholar 

  7. Salimi S, Scholz M (2021) Impact of future climate scenarios on peatland and constructed wetland water quality: a mesocosm experiment within climate chambers. J Environ Manag 289:112459

    Article  Google Scholar 

  8. Song F, Su F, Mi C, Sun D (2021) Analysis of driving forces on wetland ecosystem services value change: a case in northeast china. Sci Total Environ 751:141778

    Article  Google Scholar 

  9. Elliott LH, Igl LD, Johnson DH (2020) The relative importance of wetland area versus habitat heterogeneity for promoting species richness and abundance of wetland birds in the prairie pothole region, usa. Condor 122:060

    Article  Google Scholar 

  10. Raj S, Garyali S, Kumar S, Shidnal S (2020) Image based bird species identification using convolutional neural network. Int J Eng Res & Technol (IJERT) 9:346

    Google Scholar 

  11. Varghese A, Shyamkrishna K, Rajeswari M (2022) Utilization of deep learning technology in recognizing bird species In: AIP Conf Proc 2463:1

    MATH  Google Scholar 

  12. Xie J, Zhong Y, Zhang J, Liu S, Ding C, Triantafyllopoulos A (2023) A review of automatic recognition technology for bird vocalizations in the deep learning era. Ecol Inf 73:101927

    Article  Google Scholar 

  13. Krizhevsky A, Sutskever I, Hinton GE (2012) Imagenet classification with deep convolutional neural networks. Advances in neural information processing systems 25

  14. Huang Y-P, Basanta H (2019) Bird image retrieval and recognition using a deep learning platform. IEEE access 7:66980–66989

    Article  MATH  Google Scholar 

  15. Van Horn G, Branson S, Farrell R, Haber S, Barry J, Ipeirotis P, Perona P, Belongie S (2015) Building a Bird Recognition App and Large Scale Dataset with Citizen Scientists: The Fine Print in Fine-Grained Dataset Collection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 595–604

  16. Weiss K, Khoshgoftaar TM, Wang D (2016) A survey of transfer learning. J Big data 3:1–40

    Article  MATH  Google Scholar 

  17. Villa AG, Salazar A, Vargas F (2017) Towards automatic wild animal monitoring: identification of animal species in camera-trap images using very deep convolutional neural networks. Ecol Inf 41:24–32

    Article  Google Scholar 

  18. Ferreira AC, Silva LR, Renna F, Brandl HB, Renoult JP, Farine DR, Covas R, Doutrelant C (2020) Deep learning-based methods for individual recognition in small birds. Methods Ecol Evol 11:1072–1085

    Article  Google Scholar 

  19. Xiao K, Engstrom L, Ilyas A, Madry A (2020) Noise or signal: The role of image backgrounds in object recognition. arXiv preprint arXiv:2006.09994

  20. He K, Zhang X, Ren S, Sun J (2016) Deep Residual Learning for Image Recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778

  21. Xiao T, Xu Y, Yang K, Zhang J, Peng Y, Zhang Z (2015) The Application of Two-Level Attention Models in Deep Convolutional Neural Network for Fine-Grained Image Classification. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 842–850

  22. Wang Y, Wang Z (2019) A survey of recent work on fine-grained image classification techniques. J Vis Commun Image Represent 59:210–214

    Article  MATH  Google Scholar 

  23. Girshick R, Donahue J, Darrell T, Malik J (2014) Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 580–587

  24. Girshick R (2015) Fast r-cnn. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 1440–1448

  25. Ren S, He K, Girshick R, Sun J (2015) Faster r-cnn: Towards real-time object detection with region proposal networks. Adv Neural Inf Process Syst 28:1149

    MATH  Google Scholar 

  26. Redmon J, Divvala S, Girshick R, Farhadi A (2016) You Only Look Once: Unified, Real-Time Object Detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788

  27. Liu W, Anguelov D, Erhan D, Szegedy C, Reed S, Fu C-Y, Berg AC (2016) Ssd: Single shot multibox detector. In: Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part I 14, pp. 21–37

  28. Carion N, Massa F, Synnaeve G, Usunier N, Kirillov A, Zagoruyko S (2020) End-to-end object detection with transformers. In: European Conference on Computer Vision, pp. 213–229

  29. Li J, Zhang J, Li J, Li G, Liu S, Lin L, Li G (2024) Learning background prompts to discover implicit knowledge for open vocabulary object detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 16678–16687

  30. Iandola FN (2016) Squeezenet: Alexnet-level accuracy with 50x fewer parameters and< 0.5 mb model size. arXiv preprint arXiv:1602.07360

  31. Howard AG (2017) Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861

  32. Zhang X, Zhou X, Lin M, Sun J (2018) Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856

  33. Han K, Wang Y, Tian Q, Guo J, Xu C, Xu C (2020) Ghostnet: More features from cheap operations. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 1580–1589

  34. Vasu PKA, Gabriel J, Zhu J, Tuzel O, Ranjan A (2023) Fastvit: A fast hybrid vision transformer using structural reparameterization. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 5785–5795

  35. Shaker A, Maaz M, Rasheed H, Khan S, Yang M-H, Khan FS (2023) Swiftformer: Efficient additive attention for transformer-based real-time mobile vision applications. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 17425–17436

  36. Simonyan K, Zisserman A (2014) Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556

  37. Iandola F, Moskewicz M, Karayev S, Girshick R, Darrell T, Keutzer K (2014) Densenet: Implementing efficient convnet descriptor pyramids. arXiv preprint arXiv:1404.1869

  38. Zhang N, Donahue J, Girshick R, Darrell T (2014) Part-based r-cnns for fine-grained category detection. In: Computer Vision–ECCV 2014: 13th European Conference, Zurich, Switzerland, September 6-12, 2014, Proceedings, Part I 13, pp. 834–849

  39. Lin T-Y, RoyChowdhury A, Maji S (2015) Bilinear cnn models for fine-grained visual recognition. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 1449–1457

  40. Fu J, Zheng H, Mei T (2017) Look closer to see better: Recurrent attention convolutional neural network for fine-grained image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4438–4446

  41. Behera A, Wharton Z, Hewage PR, Bera A (2021) Context-aware attentional pooling (cap) for fine-grained visual classification. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 35, pp. 929–937

  42. He J, Chen J-N, Liu S, Kortylewski A, Yang C, Bai Y, Wang C (2022) Transfg: A transformer architecture for fine-grained recognition. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 36, pp. 852–860

  43. Wang Q, Wu B, Zhu P, Li P, Zuo W, Hu Q (2020) Eca-net: Efficient channel attention for deep convolutional neural networks. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 11534–11542

  44. Wang C-Y, Bochkovskiy A, Liao H-YM (2023) Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 7464–7475

  45. Deng J, Dong W, Socher R, Li L-J, Li K, Fei-Fei L (2009) Imagenet: A large-scale hierarchical image database. In: 2009 IEEE Conference on Computer Vision and Pattern Recognition, pp. 248–255

  46. Howard A, Sandler M, Chu G, Chen L-C, Chen B, Tan M, Wang W, Zhu Y, Pang R, Vasudevan V et al (2019) Searching for mobilenetv3. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 1314–1324

  47. Perronnin F, Sánchez J, Mensink T (2010) Improving the fisher kernel for large-scale image classification. In: Computer Vision–ECCV 2010: 11th European Conference on Computer Vision, Heraklion, Crete, Greece, September 5-11, 2010, Proceedings, Part IV 11, pp. 143–156

  48. Csurka G, Dance C, Fan L, Willamowski J, Bray C (2004) Visual categorization with bags of keypoints. In: Workshop on Statistical Learning in Computer Vision, ECCV, vol. 1, pp. 1–2

  49. Wah C, Branson S, Welinder P, Perona P, Belongie S (2011) The caltech-ucsd birds-200-2011 dataset. Technical Report CNS-TR-2011-001

  50. Vujović Ž et al (2021) Classification model evaluation metrics. Int J Adv Comput Sci Appl 12:599–606

    MATH  Google Scholar 

  51. Liu Z, Mao H, Wu C-Y, Feichtenhofer C, Darrell T, Xie S (2022) A convnet for the 2020s. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 11976–11986

  52. Dosovitskiy A (2020) An image is worth 16x16 words: Transformers for image recognition at scale. arXiv preprint arXiv:2010.11929

  53. Vasu PKA, Gabriel J, Zhu J, Tuzel O, Ranjan A (2023) Mobileone: An improved one millisecond mobile backbone. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 7907–7917

  54. Li Y, Hu J, Wen Y, Evangelidis G, Salahi K, Wang Y, Tulyakov S, Ren J (2023) Rethinking vision transformers for mobilenet size and speed. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 16889–16900

  55. Xie S, Girshick R, Dollár P, Tu Z, He K (2017) Aggregated residual transformations for deep neural networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1492–1500

  56. Ma N, Zhang X, Zheng H-T, Sun J (2018) Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131

  57. Gao Y, Beijbom O, Zhang N, Darrell T (2016) Compact bilinear pooling. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 317–326

  58. Kong S, Fowlkes C (2017) Low-rank bilinear pooling for fine-grained classification. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 365–374

  59. Wang Z, Yang P, Zhang B, Hu L, Lv W, Lin C, Zhang C, Wang Q (2024) Performance prediction for deep learning models with pipeline inference strategy. IEEE Int Things J 11(2):2964–2978

    Article  MATH  Google Scholar 

Download references

Acknowledgements

This research was supported by National Natural Science Foundation Project of CQ CSTC (No. cstc2020jcyj-msxmX0554).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Qing Zhou.

Ethics declarations

Conflict of interest

The authors declare no Conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Xia, W., Zhou, Q., Wu, D. et al. A scalable two-stage model for real-time Wetland bird recognition. J Supercomput 81, 588 (2025). https://doi.org/10.1007/s11227-025-07061-9

Download citation

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s11227-025-07061-9

Keywords