Deep MAnTra: deep learning-based multi-animal tracking for Japanese macaques

Pineda, Riza Rae; Kubo, Takatomi; Shimada, Masaki; Ikeda, Kazushi

doi:10.1007/s10015-022-00837-9

Deep MAnTra: deep learning-based multi-animal tracking for Japanese macaques

Original Article
Published: 15 December 2022

Volume 28, pages 127–138, (2023)
Cite this article

Artificial Life and Robotics Aims and scope Submit manuscript

Riza Rae Pineda^1,2,
Takatomi Kubo¹,
Masaki Shimada³ &
…
Kazushi Ikeda¹

253 Accesses
3 Citations
Explore all metrics

Abstract

Multi-instance object tracking is an active research problem in computer vision, where most novel methods analyze and locate targets on videos taken from static camera set-ups, just as many existing monitoring systems worldwide. These have proved efficient and effective for many established monitoring systems worldwide, such as animal behavior studies and human and road traffic. However, despite the growing success of computer vision in animal monitoring and behavior analysis, such a system has yet to be developed for free-ranging Japanese macaques. With this, our study aims to establish a tracking system for Japanese macaques in their natural habitat. We begin by training a monkey detector using You Only Look Once (YOLOv4) and investigating the effect of different transfer learning techniques, curriculum learning, and dataset heterogeneity to improve the model’s accuracy. Using the resulting box detections from our monkey detection model, we use SuperGlue and Murty’s algorithm for re-identifying the monkey individuals across the succeeding frames. With a mean \(AP^{50}\) of 96.59%, a precision score of 93%, a recall of 96%, and a mean \(IOU_{AP@50}\) of 77.2%, our Japanese macaque detection model trained using a YOLO-v4 architecture with spatial attention module, and Mish activation function based on 3-stage training curriculum yielded the best performance. For animal behavior studies, our tracking system can prove effective and reliable with our achieved 91.35% MOTA even on our heterogeneous dataset.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

SSD: Single Shot MultiBox Detector

Object detection using YOLO: challenges, architectural successors, datasets and applications

Article 08 August 2022

Tausif Diwan, G. Anirudh & Jitendra V. Tembhurne

Microsoft COCO: Common Objects in Context

References

Meyer JS, Hamel AF (2014) Models of stress in nonhuman primates and their relevance for human psychopathology and endocrine dysfunction. ILAR J 55(2):347–360
Article Google Scholar
Willard SL, Shively CA (2012) Modeling depression in adult female cynomolgus monkeys (Macaca fascicularis). Am J Primatol 74(6):528–542
Article Google Scholar
Matsuzawa T (2018) Hot-spring bathing of wild monkeys in Shiga-Heights: origin and propagation of a cultural behavior. Primates 59(3):209–213
Article Google Scholar
Kawai M (1965) Newly-acquired pre-cultural behavior of the natural troop of Japanese monkeys on Koshima islet. Primates 6(1):1–30
Article Google Scholar
Kawamura S (1959) The process of sub-culture propagation among Japanese macaques. Primates 2(1):43–60
Article Google Scholar
Matsuzawa T (2015) Sweet-potato washing revisited: 50th anniversary of the Primates article. Primates 56(4):285–287
Article Google Scholar
Girshick RB (2015) Fast R-CNN. In: 2015 IEEE international conference on computer vision (ICCV). IEEE, pp 1440–1448
Bochkovskiy A, Wang CY, Liao HYM (2020) Yolov4: Optimal speed and accuracy of object detection. CoRR (abs/2004.10934). https://doi.org/10.48550/arXiv.2004.10934
Redmon J, Divvala S, Girshick R, Farhadi A (2016) You only look once: unified, real-time object detection. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, pp 779–788
Redmon J, Farhadi A (2018) YOLOv3: An incremental improvement. CoRR (abs/1804.02767). https://doi.org/10.48550/arXiv.1804.02767
Wang CY, Bochkovskiy A, Liao HYM (2021) Scaled-YOLOv4: Scaling cross stage partial network. In: 2021 IEEE/CVF conference on computer vision and pattern recognition (CVPR). IEEE, pp 13024–13033
Tan M, Pang R, Le QV (2020) EfficientDet: scalable and efficient object detection. In: 2020 IEEE/CVF conference on computer vision and pattern recognition (CVPR). IEEE, pp 10778–10787
Lin TY, et al. (2014) Microsoft COCO: Common Objects in Context. In: D. Fleet, T. Pajdla, B. Schiele, T. Tuytelaars (Eds) ComputerVision – ECCV. Lecture Notes in Computer Science, vol 8693. Springer, Cham, pp 740–755.
Deng J, Dong W, Socher R, Li LJ, Li K, Fei-Fei L (2009) ImageNet: a large-scale hierarchical image database. In: 2009 IEEE conference on computer vision and pattern recognition (CVPR). IEEE, pp 248–255
Everingham M, Gool L, Williams CK, Winn J, Zisserman A (2010) The PASCAL visual object classes (VOC) challenge. Int J Comput Vis 88:303–338
Article Google Scholar
Bozinovski S (2020) Reminder of the first paper on transfer learning in neural networks, 1976. Informatica (Slovenia) 44. https://doi.org/10.31449/inf.v44i3.2828
Zhuang F et al (2020) A comprehensive survey on transfer learning. Proc IEEE Inst Electr Electron Eng 109(1):43–76
Article Google Scholar
Bengio Y, Louradour J, Collobert R, Weston J (2009) Curriculum learning. In: 26th annual international conference on machine learning (ICML’09). ACM, pp. 41–48
Soviany P, Ionescu RT, Rota P et al (2022) Curriculum learning: a survey. Int J Comput Vis 130:1526–1565
Article Google Scholar
Clapham M, Miller E, Nguyen M, Darimont CT (2020) Automated facial recognition for wildlife that lack unique markings: a deep learning approach for brown bears. Ecol Evol 10(23):12883–12892
Article Google Scholar
McIntosh D, Marques TP, Albu AB, Rountree R, Leo FD (2020) Movement tracks for the automatic detection of fish behavior in videos. CoRR (abs/2011.14070). https://doi.org/10.48550/arXiv.2011.14070
Sarfati R, Hayes J, Sarfati E, Peleg O (2020) Spatio-temporal reconstruction of emergent flash synchronization in firefly swarms via stereoscopic 360-degree cameras. J R Soc Interface 17:20200179
Article Google Scholar
Labuguen R, Matsumoto J, Negrete SB, Nishimaru H, Nishijo H, Takada M, Go Y, Inoue K-i, Shibata T (2021) Macaquepose: a novel “in the wild” macaque monkey pose dataset for markerless motion capture. Front Behav Neurosci 14:268
Article Google Scholar
Schofield D, Nagrani A, Zisserman A, Hayashi M, Matsuzawa T, Biro D, Carvalho S (2019) Chimpanzee face recognition from videos in the wild using deep learning. Sci Adv 5(9):eaaw0736
Article Google Scholar
Sarlin PE, DeTone D, Malisiewicz T, Rabinovich A (2020) Superglue: Learning feature matching with graph neural networks. In: 2020 IEEE/CVF conference on computer vision and pattern recognition (CVPR). pp 4937–4946
Crouse DF (2016) On implementing 2D rectangular assignment algorithms. IEEE Trans Aeros Electron Syst 52(4):1679–1696
Article Google Scholar
He K, Zhang X, Ren S, Sun J (2014) Spatial pyramid pooling in deep convolutional networks for visual recognition. In: Fleet D, Pajdla T, Schiele B, Tuytelaars T (eds) Computer vision – ECCV. Lecture notes in computer science, vol 8691. Springer, Cham, pp 346–361
Liu S, Qi L, Qin H, Shi J, Jia J (2018) Path aggregation network for instance segmentation. In: 2018 IEEE/CVF conference on computer vision and pattern recognition (CVPR). pp 8759–8768
Misra D (2020) Mish: a self regularized non-monotonic neural activation function. In: 2020 British machine vision conference (BMVC). https://doi.org/10.48550/arXiv.1908.08681
Ramachandran P, Zoph B, Le Q (2018) Searching for activation functions. In: 2018 International conference on learning representations (ICLR) workshop. https://doi.org/10.48550/arXiv.1710.05941
Xu B, Wang N, Chen T, Li M (2015) Empirical evaluation of rectified activations in convolutional network. In: 2015 International conference on machine learning (ICML) workshop. https://doi.org/10.48550/arXiv.1505.00853
Woo S, Park J, Lee J, Kweon IS (2018) CBAM: convolutional block attention module. In: Ferrari V, Hebert M, Sminchisescu C, Weiss Y (eds) Computer vision – ECCV. Lecture notes in computer science, vol 11211. Springer, Cham, pp 3–19
Krueger KA, Dayan P (2009) Flexible shaping: How learning in small steps helps. Cognition 110(3):380–394
Article Google Scholar
Shimada M, Sueur C (2018) Social play among juvenile wild Japanese macaques (Macaca fuscata) strengthens their social bonds. Am J Primatol 80(1):e22728
Article Google Scholar
Shimada M, Uno T, Nakagawa N, Fujita S, Izawa K (2009) Case study of a one-sided attack by multiple troop members on a nontroop adolescent male and the death of Japanese macaques (Macaca fuscata). Aggress Behav 35(4):334–341
Article Google Scholar
Bernardin K, Stiefelhagen R (2008) Evaluating multiple object tracking performance: The CLEAR MOT metrics. J Image Video Proc 2008: https://doi.org/10.1155/2008/246309
Article Google Scholar

Download references

Acknowledgements

This work was partially supported by the joint project of Kyoto University and Toyota Motor Corporation, titled “Advanced Mathematical Science for Mobility Society”, JSPS KAKENHI Grant Numbers 17H05863, and 18K19821. The first author, R. R. Pineda, is supported by a postgraduate scholarship from the Engineering Research and Development for Technology (ERDT), Philippines.

Author information

Authors and Affiliations

Nara Institute of Science and Technology, Nara, Japan
Riza Rae Pineda, Takatomi Kubo & Kazushi Ikeda
Department of Computer Science, College of Engineering, University of the Philippines, Quezon City, Philippines
Riza Rae Pineda
Teikyo University of Science, Yamanashi, Japan
Masaki Shimada

Authors

Riza Rae Pineda
View author publications
You can also search for this author in PubMed Google Scholar
Takatomi Kubo
View author publications
You can also search for this author in PubMed Google Scholar
Masaki Shimada
View author publications
You can also search for this author in PubMed Google Scholar
Kazushi Ikeda
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Riza Rae Pineda.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

This work was submitted and accepted for the Journal Track of the joint symposium of the 28th International Symposium on Artificial Life and Robotics, the 8th International Symposium on BioComplexity, and the 6th International Symposium on Swarm Behavior and Bio-Inspired Robotics (Beppu, Oita, January 25–27, 2023).

About this article

Cite this article

Pineda, R.R., Kubo, T., Shimada, M. et al. Deep MAnTra: deep learning-based multi-animal tracking for Japanese macaques. Artif Life Robotics 28, 127–138 (2023). https://doi.org/10.1007/s10015-022-00837-9

Download citation

Received: 31 August 2022
Accepted: 07 November 2022
Published: 15 December 2022
Issue Date: February 2023
DOI: https://doi.org/10.1007/s10015-022-00837-9

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Deep MAnTra: deep learning-based multi-animal tracking for Japanese macaques

Abstract

Access this article

Similar content being viewed by others

SSD: Single Shot MultiBox Detector

Object detection using YOLO: challenges, architectural successors, datasets and applications

Microsoft COCO: Common Objects in Context

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

About this article

Cite this article

Keywords

Navigation

Deep MAnTra: deep learning-based multi-animal tracking for Japanese macaques

Abstract

Access this article

Similar content being viewed by others

SSD: Single Shot MultiBox Detector

Object detection using YOLO: challenges, architectural successors, datasets and applications

Microsoft COCO: Common Objects in Context

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

About this article

Cite this article

Share this article

Keywords

Search

Navigation