A Real-Time Multi-Stage Architecture for Pose Estimation of Zebrafish Head with Convolutional Neural Networks

Huang, Zhang-Jin; He, Xiang-Xiang; Wang, Fang-Jun; Shen, Qing

doi:10.1007/s11390-021-9599-5

A Real-Time Multi-Stage Architecture for Pose Estimation of Zebrafish Head with Convolutional Neural Networks

Regular Paper
Published: 31 March 2021

Volume 36, pages 434–444, (2021)
Cite this article

Journal of Computer Science and Technology Aims and scope Submit manuscript

Zhang-Jin Huang^1,2,3,
Xiang-Xiang He¹,
Fang-Jun Wang^1,2 &
…
Qing Shen¹

313 Accesses
1 Citation
Explore all metrics

Abstract

In order to conduct optical neurophysiology experiments on a freely swimming zebrafish, it is essential to quantify the zebrafish head to determine exact lighting positions. To efficiently quantify a zebrafish head's behaviors with limited resources, we propose a real-time multi-stage architecture based on convolutional neural networks for pose estimation of the zebrafish head on CPUs. Each stage is implemented with a small neural network. Specifically, a light-weight object detector named Micro-YOLO is used to detect a coarse region of the zebrafish head in the first stage. In the second stage, a tiny bounding box refinement network is devised to produce a high-quality bounding box around the zebrafish head. Finally, a small pose estimation network named tiny-hourglass is designed to detect keypoints in the zebrafish head. The experimental results show that using Micro-YOLO combined with RegressNet to predict the zebrafish head region is not only more accurate but also much faster than Faster R-CNN which is the representative of two-stage detectors. Compared with DeepLabCut, a state-of-the-art method to estimate poses for user-defined body parts, our multi-stage architecture can achieve a higher accuracy, and runs 19x faster than it on CPUs.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Object detection using YOLO: challenges, architectural successors, datasets and applications

Article 08 August 2022

Encoder-Decoder with Atrous Separable Convolution for Semantic Image Segmentation

Statistical Analysis of Design Aspects of Various YOLO-Based Deep Learning Models for Object Detection

Article Open access 02 August 2023

References

Cong L, Wang Z, Chai Y, Han W, Shang C, Yang W, Bai L, Du J, Wang K, Wen Q. Rapid whole brain imaging of neural activity in freely behaving larval zebrafish (Danio rerio). Elife, 2017, 6: Article No. e28158. https://doi.org/10.7554/elife.28158.
Xu Z P, Cheng X E. Zebrafish tracking using convolutional neural networks. Scientific Reports, 2017, 7: Article No. 42815. https://doi.org/10.1038/srep42815.
Mathis A, Mamidanna P, Cury K M, Abe T, Murthy V N, Mathis M W, Bethge M. DeepLabCut: Markerless pose estimation of user-defined body parts with deep learning. Nature Neuroscience, 2018, 21: 1281-1289. https://doi.org/10.1038/s41593-018-0209-y.
Article Google Scholar
Girshick R, Donahue J, Darrell T, Malik J. Rich feature hierarchies for accurate object detection and semantic segmentation. In Proc. the 2014 IEEE Conference on Computer Vision and Pattern Recognition, June 2014, pp.580-587. https://doi.org/10.1109/CVPR.2014.81.
Girshick R. Fast R-CNN. In Proc. the 2015 IEEE International Conference on Computer Vision, December 2015, pp.1440-1448. https://doi.org/10.1109/ICCV.2015.169.
Ren S, He K, Girshick R, Sun J. Faster R-CNN: Towards real-time object detection with region proposal networks. In Proc. the 29th Annual Conference on Neural Information Processing Systems, December 2015, pp.91-99.
Dai J, Li Y, He K, Sun J. R-FCN: Object detection via region-based fully convolutional networks. In Proc. the 30th Annual Conference on Neural Information Processing Systems, December 2016, pp.379-387.
Uijlings J R, van de Sande K E, Gevers T, Smeulders A W. Selective search for object recognition. International Journal of Computer Vision, 2013, 104(2): 154-171. https://doi.org/10.1007/s11263-013-0620-5.
Article Google Scholar
Redmon J, Divvala S, Girshick R, Farhadi A. You only look once: Unified, real-time object detection. In Proc. the 2016 IEEE Conference on Computer Vision and Pattern Recognition, June 2016, pp.779-788. https://doi.org/10.1109/CVPR.2016.91.
Liu W, Anguelov D, Erhan D, Szegedy C, Reed S, Fu C Y, Berg A C. SSD: Single shot multibox detector. In Proc. the 14th European Conference on Computer Vision, October 2016, pp.21-37. https://doi.org/10.1007/978-3-319-46448-0_2.
Cai Z, Vasconcelos N. Cascade R-CNN: Delving into high quality object detection. In Proc. the 2018 IEEE Conference on Computer Vision and Pattern Recognition, June 2018, pp.6154-6162. https://doi.org/10.1109/CVPR.2018.00644.
Toshev A, Szegedy C. DeepPose: Human pose estimation via deep neural networks. In Proc. the 2014 IEEE Conference on Computer Vision and Pattern Recognition, June 2014, pp.1653-1660. https://doi.org/10.1109/CVPR.2014.214.
Pfister T, Simonyan K, Charles J, Zisserman A. Deep convolutional neural networks for efficient pose estimation in gesture videos. In Proc. the 12th Asian Conference on Computer Vision, November 2014, pp.538-552. https://doi.org/10.1007/978-3-319-16865-4_35.
Carreira J, Agrawal P, Fragkiadaki K, Malik J. Human pose estimation with iterative error feedback. In Proc. the 2016 IEEE Conference on Computer Vision and Pattern Recognition, June 2016, pp.4733-4742. https://doi.org/10.1109/CVPR.2016.512.
Pfister T, Charles J, Zisserman A. Flowing ConvNets for human pose estimation in videos. In Proc. the 2015 IEEE International Conference on Computer Vision, December 2015, pp.1913-1921. https://doi.org/10.1109/ICCV.2015.222.
Wei S E, Ramakrishna V, Kanade T, Sheikh Y. Convolutional pose machines. In Proc. the 2016 IEEE Conference on Computer Vision and Pattern Recognition, June 2016, pp.4724-4732. https://doi.org/10.1109/CVPR.2016.511.
Newell A, Yang K, Deng J. Stacked hourglass networks for human pose estimation. In Proc. the 14th European Conference on Computer Vision, October 2016, pp.483-499. https://doi.org/10.1007/978-3-319-46484-8_29.
Pishchulin L, Insafutdinov E, Tang S, Andres B, Andriluka M, Gehler P V, Schiele B. DeepCut: Joint subset partition and labeling for multi person pose estimation. In Proc. the 2016 IEEE Conference on Computer Vision and Pattern Recognition, June 2016, pp.4929-4937. https://doi.org/10.1109/CVPR.2016.533.
Insafutdinov E, Pishchulin L, Andres B, Andriluka M, Schiele B. Deepercut: A deeper, stronger, and faster multi-person pose estimation model. In Proc. the 14th European Conference on Computer Vision, October 2016, pp.34-50. https://doi.org/10.1007/978-3-319-46466-4_3.
Cao Z, Simon T, Wei S E, Sheikh Y. Realtime multi-person 2D pose estimation using part affinity fields. In Proc. the 2017 IEEE Conference on Computer Vision and Pattern Recognition, July 2017, pp.1302-1310. https://doi.org/10.1109/CVPR.2017.143.
He K, Zhang X, Ren S, Sun J. Deep residual learning for image recognition. In Proc. the 2016 IEEE Conference on Computer Vision and Pattern Recognition, June 2016, pp.770-778. https://doi.org/10.1109/CVPR.2016.90.
Li S, Fang Z, Song W, Hao A, Qin H. Bidirectional optimization coupled lightweight networks for efficient and robust multi-person 2D pose estimation. Journal of Computer Science and Technology, 2019, 34(3): 522-536. https://doi.org/10.1007/s11390-019-1924-x.
Article Google Scholar

Download references

Author information

Authors and Affiliations

School of Computer Science and Technology, University of Science and Technology of China, Hefei, 230027, China
Zhang-Jin Huang, Xiang-Xiang He, Fang-Jun Wang & Qing Shen
School of Data Science, University of Science and Technology of China, Hefei, 230027, China
Zhang-Jin Huang & Fang-Jun Wang
Anhui Province Key Laboratory of Software in Computing and Communication, Hefei, 230027, China
Zhang-Jin Huang

Authors

Zhang-Jin Huang
View author publications
You can also search for this author in PubMed Google Scholar
Xiang-Xiang He
View author publications
You can also search for this author in PubMed Google Scholar
Fang-Jun Wang
View author publications
You can also search for this author in PubMed Google Scholar
Qing Shen
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Zhang-Jin Huang.

Supplementary Information

ESM 1

(PDF 560 kb)

Rights and permissions

Reprints and permissions

About this article

Cite this article

Huang, ZJ., He, XX., Wang, FJ. et al. A Real-Time Multi-Stage Architecture for Pose Estimation of Zebrafish Head with Convolutional Neural Networks. J. Comput. Sci. Technol. 36, 434–444 (2021). https://doi.org/10.1007/s11390-021-9599-5

Download citation

Received: 30 March 2019
Accepted: 03 June 2020
Published: 31 March 2021
Issue Date: April 2021
DOI: https://doi.org/10.1007/s11390-021-9599-5

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A Real-Time Multi-Stage Architecture for Pose Estimation of Zebrafish Head with Convolutional Neural Networks

Abstract

Access this article

Similar content being viewed by others

Object detection using YOLO: challenges, architectural successors, datasets and applications

Encoder-Decoder with Atrous Separable Convolution for Semantic Image Segmentation

Statistical Analysis of Design Aspects of Various YOLO-Based Deep Learning Models for Object Detection

References

Author information

Authors and Affiliations

Corresponding author

Supplementary Information

ESM 1

Rights and permissions

About this article

Cite this article

Keywords

Navigation

A Real-Time Multi-Stage Architecture for Pose Estimation of Zebrafish Head with Convolutional Neural Networks

Abstract

Access this article

Similar content being viewed by others

Object detection using YOLO: challenges, architectural successors, datasets and applications

Encoder-Decoder with Atrous Separable Convolution for Semantic Image Segmentation

Statistical Analysis of Design Aspects of Various YOLO-Based Deep Learning Models for Object Detection

References

Author information

Authors and Affiliations

Corresponding author

Supplementary Information

ESM 1

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation