A scalable two-stage model for real-time Wetland bird recognition

Xia, Wenyuan; Zhou, Qing; Wu, Dayu; Wang, Siyuan; Zhou, Mengshuang

doi:10.1007/s11227-025-07061-9

A scalable two-stage model for real-time Wetland bird recognition

Published: 08 March 2025

Volume 81, article number 588, (2025)
Cite this article

The Journal of Supercomputing Aims and scope Submit manuscript

Wenyuan Xia¹,
Qing Zhou¹,
Dayu Wu¹,
Siyuan Wang¹ &
…
Mengshuang Zhou¹

110 Accesses
Explore all metrics

Abstract

Traditional efficient lightweight image classification algorithms generally demonstrate low accuracy in real-time wetland bird recognition tasks due to the environmental complexity and the high similarity among bird species. Moreover, a bird recognition server needs to perform computation-intensive tasks of multi-process parallel inferences, requiring a low inference latency of the bird recognition algorithm. Traditional high-accuracy fine-grained methods cannot meet the demands due to their high computational complexity. In this study, we introduce a scalable two-stage model for real-time wetland bird recognition, which incorporates an object detector and a fine-grained image recognition technique, bilinear pooling, to encode fine-grained features. Additionally, we design a lightweight architecture and propose a bilinear scalable module in the bilinear pooling to trade-off between latency and accuracy. The experimental results show that the proposed method achieves 77.6% and 97.6% accuracy on the CUB and WPB datasets, respectively, which are much higher than MobileNetV3 and ShuffleNetV2, with a low inference latency of only 79.5 ms on CPU. Furthermore, parallel inference experiments in practical environments demonstrate that the proposed method achieves an inference speed of 15.3 FPS, with 12 parallel video streams.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

CNN-Based Deep Spatial Pyramid Match Kernel for Classification of Varying Size Images

FasterMLP efficient vision networks combining attention mechanisms and wavelet downsampling

Article Open access 15 February 2025

HetConv: Beyond Homogeneous Convolution Kernels for Deep CNNs

Article 18 November 2019

Data availability

Data associated with this work can be availed from the corresponding author upon formal request.

References

Hu S, Niu Z, Chen Y, Li L, Zhang H (2017) Global wetlands: potential distribution, wetland loss, and status. Sci Total Environ 586:319–327
Article MATH Google Scholar
Xu T, Weng B, Yan D, Wang K, Li X, Bi W, Li M, Cheng X, Liu Y (2019) Wetlands of international importance: Status, threats, and future protection. Int J Environ Res Public Health 16:1818
Article MATH Google Scholar
Kati VI, Sekercioglu CH (2006) Diversity, ecological structure, and conservation of the landbird community of dadia reserve, greece. Divers Distrib 12:620–629
Article MATH Google Scholar
Wang S, Loreau M (2014) Ecosystem stability in space: $\alpha $, $\beta $ and $\gamma $ variability. Ecol Lett 17:891–901
Article MATH Google Scholar
Brambilla M, Rizzolli F, Franzoi A, Caldonazzi M, Zanghellini S, Pedrini P (2020) A network of small protected areas favoured generalist but not specialized wetland birds in a 30-year period. Biol Conserv 248:108699
Article Google Scholar
Mitsch WJ, Bernal B, Nahlik AM, Mander Ü, Zhang L, Anderson CJ, Jørgensen SE, Brix H (2013) Wetlands, carbon, and climate change. Landsc Ecol 28:583–597
Article Google Scholar
Salimi S, Scholz M (2021) Impact of future climate scenarios on peatland and constructed wetland water quality: a mesocosm experiment within climate chambers. J Environ Manag 289:112459
Article Google Scholar
Song F, Su F, Mi C, Sun D (2021) Analysis of driving forces on wetland ecosystem services value change: a case in northeast china. Sci Total Environ 751:141778
Article Google Scholar
Elliott LH, Igl LD, Johnson DH (2020) The relative importance of wetland area versus habitat heterogeneity for promoting species richness and abundance of wetland birds in the prairie pothole region, usa. Condor 122:060
Article Google Scholar
Raj S, Garyali S, Kumar S, Shidnal S (2020) Image based bird species identification using convolutional neural network. Int J Eng Res & Technol (IJERT) 9:346
Google Scholar
Varghese A, Shyamkrishna K, Rajeswari M (2022) Utilization of deep learning technology in recognizing bird species In: AIP Conf Proc 2463:1
MATH Google Scholar
Xie J, Zhong Y, Zhang J, Liu S, Ding C, Triantafyllopoulos A (2023) A review of automatic recognition technology for bird vocalizations in the deep learning era. Ecol Inf 73:101927
Article Google Scholar
Krizhevsky A, Sutskever I, Hinton GE (2012) Imagenet classification with deep convolutional neural networks. Advances in neural information processing systems 25
Huang Y-P, Basanta H (2019) Bird image retrieval and recognition using a deep learning platform. IEEE access 7:66980–66989
Article MATH Google Scholar
Van Horn G, Branson S, Farrell R, Haber S, Barry J, Ipeirotis P, Perona P, Belongie S (2015) Building a Bird Recognition App and Large Scale Dataset with Citizen Scientists: The Fine Print in Fine-Grained Dataset Collection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 595–604
Weiss K, Khoshgoftaar TM, Wang D (2016) A survey of transfer learning. J Big data 3:1–40
Article MATH Google Scholar
Villa AG, Salazar A, Vargas F (2017) Towards automatic wild animal monitoring: identification of animal species in camera-trap images using very deep convolutional neural networks. Ecol Inf 41:24–32
Article Google Scholar
Ferreira AC, Silva LR, Renna F, Brandl HB, Renoult JP, Farine DR, Covas R, Doutrelant C (2020) Deep learning-based methods for individual recognition in small birds. Methods Ecol Evol 11:1072–1085
Article Google Scholar
Xiao K, Engstrom L, Ilyas A, Madry A (2020) Noise or signal: The role of image backgrounds in object recognition. arXiv preprint arXiv:2006.09994
He K, Zhang X, Ren S, Sun J (2016) Deep Residual Learning for Image Recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778
Xiao T, Xu Y, Yang K, Zhang J, Peng Y, Zhang Z (2015) The Application of Two-Level Attention Models in Deep Convolutional Neural Network for Fine-Grained Image Classification. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 842–850
Wang Y, Wang Z (2019) A survey of recent work on fine-grained image classification techniques. J Vis Commun Image Represent 59:210–214
Article MATH Google Scholar
Girshick R, Donahue J, Darrell T, Malik J (2014) Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 580–587
Girshick R (2015) Fast r-cnn. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 1440–1448
Ren S, He K, Girshick R, Sun J (2015) Faster r-cnn: Towards real-time object detection with region proposal networks. Adv Neural Inf Process Syst 28:1149
MATH Google Scholar
Redmon J, Divvala S, Girshick R, Farhadi A (2016) You Only Look Once: Unified, Real-Time Object Detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788
Liu W, Anguelov D, Erhan D, Szegedy C, Reed S, Fu C-Y, Berg AC (2016) Ssd: Single shot multibox detector. In: Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part I 14, pp. 21–37
Carion N, Massa F, Synnaeve G, Usunier N, Kirillov A, Zagoruyko S (2020) End-to-end object detection with transformers. In: European Conference on Computer Vision, pp. 213–229
Li J, Zhang J, Li J, Li G, Liu S, Lin L, Li G (2024) Learning background prompts to discover implicit knowledge for open vocabulary object detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 16678–16687
Iandola FN (2016) Squeezenet: Alexnet-level accuracy with 50x fewer parameters and< 0.5 mb model size. arXiv preprint arXiv:1602.07360
Howard AG (2017) Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861
Zhang X, Zhou X, Lin M, Sun J (2018) Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856
Han K, Wang Y, Tian Q, Guo J, Xu C, Xu C (2020) Ghostnet: More features from cheap operations. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 1580–1589
Vasu PKA, Gabriel J, Zhu J, Tuzel O, Ranjan A (2023) Fastvit: A fast hybrid vision transformer using structural reparameterization. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 5785–5795
Shaker A, Maaz M, Rasheed H, Khan S, Yang M-H, Khan FS (2023) Swiftformer: Efficient additive attention for transformer-based real-time mobile vision applications. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 17425–17436
Simonyan K, Zisserman A (2014) Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556
Iandola F, Moskewicz M, Karayev S, Girshick R, Darrell T, Keutzer K (2014) Densenet: Implementing efficient convnet descriptor pyramids. arXiv preprint arXiv:1404.1869
Zhang N, Donahue J, Girshick R, Darrell T (2014) Part-based r-cnns for fine-grained category detection. In: Computer Vision–ECCV 2014: 13th European Conference, Zurich, Switzerland, September 6-12, 2014, Proceedings, Part I 13, pp. 834–849
Lin T-Y, RoyChowdhury A, Maji S (2015) Bilinear cnn models for fine-grained visual recognition. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 1449–1457
Fu J, Zheng H, Mei T (2017) Look closer to see better: Recurrent attention convolutional neural network for fine-grained image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4438–4446
Behera A, Wharton Z, Hewage PR, Bera A (2021) Context-aware attentional pooling (cap) for fine-grained visual classification. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 35, pp. 929–937
He J, Chen J-N, Liu S, Kortylewski A, Yang C, Bai Y, Wang C (2022) Transfg: A transformer architecture for fine-grained recognition. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 36, pp. 852–860
Wang Q, Wu B, Zhu P, Li P, Zuo W, Hu Q (2020) Eca-net: Efficient channel attention for deep convolutional neural networks. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 11534–11542
Wang C-Y, Bochkovskiy A, Liao H-YM (2023) Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 7464–7475
Deng J, Dong W, Socher R, Li L-J, Li K, Fei-Fei L (2009) Imagenet: A large-scale hierarchical image database. In: 2009 IEEE Conference on Computer Vision and Pattern Recognition, pp. 248–255
Howard A, Sandler M, Chu G, Chen L-C, Chen B, Tan M, Wang W, Zhu Y, Pang R, Vasudevan V et al (2019) Searching for mobilenetv3. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 1314–1324
Perronnin F, Sánchez J, Mensink T (2010) Improving the fisher kernel for large-scale image classification. In: Computer Vision–ECCV 2010: 11th European Conference on Computer Vision, Heraklion, Crete, Greece, September 5-11, 2010, Proceedings, Part IV 11, pp. 143–156
Csurka G, Dance C, Fan L, Willamowski J, Bray C (2004) Visual categorization with bags of keypoints. In: Workshop on Statistical Learning in Computer Vision, ECCV, vol. 1, pp. 1–2
Wah C, Branson S, Welinder P, Perona P, Belongie S (2011) The caltech-ucsd birds-200-2011 dataset. Technical Report CNS-TR-2011-001
Vujović Ž et al (2021) Classification model evaluation metrics. Int J Adv Comput Sci Appl 12:599–606
MATH Google Scholar
Liu Z, Mao H, Wu C-Y, Feichtenhofer C, Darrell T, Xie S (2022) A convnet for the 2020s. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 11976–11986
Dosovitskiy A (2020) An image is worth 16x16 words: Transformers for image recognition at scale. arXiv preprint arXiv:2010.11929
Vasu PKA, Gabriel J, Zhu J, Tuzel O, Ranjan A (2023) Mobileone: An improved one millisecond mobile backbone. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 7907–7917
Li Y, Hu J, Wen Y, Evangelidis G, Salahi K, Wang Y, Tulyakov S, Ren J (2023) Rethinking vision transformers for mobilenet size and speed. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 16889–16900
Xie S, Girshick R, Dollár P, Tu Z, He K (2017) Aggregated residual transformations for deep neural networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1492–1500
Ma N, Zhang X, Zheng H-T, Sun J (2018) Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131
Gao Y, Beijbom O, Zhang N, Darrell T (2016) Compact bilinear pooling. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 317–326
Kong S, Fowlkes C (2017) Low-rank bilinear pooling for fine-grained classification. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 365–374
Wang Z, Yang P, Zhang B, Hu L, Lv W, Lin C, Zhang C, Wang Q (2024) Performance prediction for deep learning models with pipeline inference strategy. IEEE Int Things J 11(2):2964–2978
Article MATH Google Scholar

Download references

Acknowledgements

This research was supported by National Natural Science Foundation Project of CQ CSTC (No. cstc2020jcyj-msxmX0554).

Author information

Authors and Affiliations

College of Computer Science, Chongqing University, No. 55 South University Town Road, Chongqing, 401331, China
Wenyuan Xia, Qing Zhou, Dayu Wu, Siyuan Wang & Mengshuang Zhou

Authors

Wenyuan Xia
View author publications
You can also search for this author inPubMed Google Scholar
Qing Zhou
View author publications
You can also search for this author inPubMed Google Scholar
Dayu Wu
View author publications
You can also search for this author inPubMed Google Scholar
Siyuan Wang
View author publications
You can also search for this author inPubMed Google Scholar
Mengshuang Zhou
View author publications
You can also search for this author inPubMed Google Scholar

Corresponding author

Correspondence to Qing Zhou.

Ethics declarations

Conflict of interest

The authors declare no Conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Xia, W., Zhou, Q., Wu, D. et al. A scalable two-stage model for real-time Wetland bird recognition. J Supercomput 81, 588 (2025). https://doi.org/10.1007/s11227-025-07061-9

Download citation

Accepted: 12 February 2025
Published: 08 March 2025
DOI: https://doi.org/10.1007/s11227-025-07061-9

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A scalable two-stage model for real-time Wetland bird recognition

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

CNN-Based Deep Spatial Pyramid Match Kernel for Classification of Varying Size Images

FasterMLP efficient vision networks combining attention mechanisms and wavelet downsampling

HetConv: Beyond Homogeneous Convolution Kernels for Deep CNNs

Data availability

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now