Abstract
Multi-instance object tracking is an active research problem in computer vision, where most novel methods analyze and locate targets on videos taken from static camera set-ups, just as many existing monitoring systems worldwide. These have proved efficient and effective for many established monitoring systems worldwide, such as animal behavior studies and human and road traffic. However, despite the growing success of computer vision in animal monitoring and behavior analysis, such a system has yet to be developed for free-ranging Japanese macaques. With this, our study aims to establish a tracking system for Japanese macaques in their natural habitat. We begin by training a monkey detector using You Only Look Once (YOLOv4) and investigating the effect of different transfer learning techniques, curriculum learning, and dataset heterogeneity to improve the model’s accuracy. Using the resulting box detections from our monkey detection model, we use SuperGlue and Murty’s algorithm for re-identifying the monkey individuals across the succeeding frames. With a mean \(AP^{50}\) of 96.59%, a precision score of 93%, a recall of 96%, and a mean \(IOU_{AP@50}\) of 77.2%, our Japanese macaque detection model trained using a YOLO-v4 architecture with spatial attention module, and Mish activation function based on 3-stage training curriculum yielded the best performance. For animal behavior studies, our tracking system can prove effective and reliable with our achieved 91.35% MOTA even on our heterogeneous dataset.
Similar content being viewed by others
References
Meyer JS, Hamel AF (2014) Models of stress in nonhuman primates and their relevance for human psychopathology and endocrine dysfunction. ILAR J 55(2):347–360
Willard SL, Shively CA (2012) Modeling depression in adult female cynomolgus monkeys (Macaca fascicularis). Am J Primatol 74(6):528–542
Matsuzawa T (2018) Hot-spring bathing of wild monkeys in Shiga-Heights: origin and propagation of a cultural behavior. Primates 59(3):209–213
Kawai M (1965) Newly-acquired pre-cultural behavior of the natural troop of Japanese monkeys on Koshima islet. Primates 6(1):1–30
Kawamura S (1959) The process of sub-culture propagation among Japanese macaques. Primates 2(1):43–60
Matsuzawa T (2015) Sweet-potato washing revisited: 50th anniversary of the Primates article. Primates 56(4):285–287
Girshick RB (2015) Fast R-CNN. In: 2015 IEEE international conference on computer vision (ICCV). IEEE, pp 1440–1448
Bochkovskiy A, Wang CY, Liao HYM (2020) Yolov4: Optimal speed and accuracy of object detection. CoRR (abs/2004.10934). https://doi.org/10.48550/arXiv.2004.10934
Redmon J, Divvala S, Girshick R, Farhadi A (2016) You only look once: unified, real-time object detection. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, pp 779–788
Redmon J, Farhadi A (2018) YOLOv3: An incremental improvement. CoRR (abs/1804.02767). https://doi.org/10.48550/arXiv.1804.02767
Wang CY, Bochkovskiy A, Liao HYM (2021) Scaled-YOLOv4: Scaling cross stage partial network. In: 2021 IEEE/CVF conference on computer vision and pattern recognition (CVPR). IEEE, pp 13024–13033
Tan M, Pang R, Le QV (2020) EfficientDet: scalable and efficient object detection. In: 2020 IEEE/CVF conference on computer vision and pattern recognition (CVPR). IEEE, pp 10778–10787
Lin TY, et al. (2014) Microsoft COCO: Common Objects in Context. In: D. Fleet, T. Pajdla, B. Schiele, T. Tuytelaars (Eds) ComputerVision – ECCV. Lecture Notes in Computer Science, vol 8693. Springer, Cham, pp 740–755.
Deng J, Dong W, Socher R, Li LJ, Li K, Fei-Fei L (2009) ImageNet: a large-scale hierarchical image database. In: 2009 IEEE conference on computer vision and pattern recognition (CVPR). IEEE, pp 248–255
Everingham M, Gool L, Williams CK, Winn J, Zisserman A (2010) The PASCAL visual object classes (VOC) challenge. Int J Comput Vis 88:303–338
Bozinovski S (2020) Reminder of the first paper on transfer learning in neural networks, 1976. Informatica (Slovenia) 44. https://doi.org/10.31449/inf.v44i3.2828
Zhuang F et al (2020) A comprehensive survey on transfer learning. Proc IEEE Inst Electr Electron Eng 109(1):43–76
Bengio Y, Louradour J, Collobert R, Weston J (2009) Curriculum learning. In: 26th annual international conference on machine learning (ICML’09). ACM, pp. 41–48
Soviany P, Ionescu RT, Rota P et al (2022) Curriculum learning: a survey. Int J Comput Vis 130:1526–1565
Clapham M, Miller E, Nguyen M, Darimont CT (2020) Automated facial recognition for wildlife that lack unique markings: a deep learning approach for brown bears. Ecol Evol 10(23):12883–12892
McIntosh D, Marques TP, Albu AB, Rountree R, Leo FD (2020) Movement tracks for the automatic detection of fish behavior in videos. CoRR (abs/2011.14070). https://doi.org/10.48550/arXiv.2011.14070
Sarfati R, Hayes J, Sarfati E, Peleg O (2020) Spatio-temporal reconstruction of emergent flash synchronization in firefly swarms via stereoscopic 360-degree cameras. J R Soc Interface 17:20200179
Labuguen R, Matsumoto J, Negrete SB, Nishimaru H, Nishijo H, Takada M, Go Y, Inoue K-i, Shibata T (2021) Macaquepose: a novel “in the wild” macaque monkey pose dataset for markerless motion capture. Front Behav Neurosci 14:268
Schofield D, Nagrani A, Zisserman A, Hayashi M, Matsuzawa T, Biro D, Carvalho S (2019) Chimpanzee face recognition from videos in the wild using deep learning. Sci Adv 5(9):eaaw0736
Sarlin PE, DeTone D, Malisiewicz T, Rabinovich A (2020) Superglue: Learning feature matching with graph neural networks. In: 2020 IEEE/CVF conference on computer vision and pattern recognition (CVPR). pp 4937–4946
Crouse DF (2016) On implementing 2D rectangular assignment algorithms. IEEE Trans Aeros Electron Syst 52(4):1679–1696
He K, Zhang X, Ren S, Sun J (2014) Spatial pyramid pooling in deep convolutional networks for visual recognition. In: Fleet D, Pajdla T, Schiele B, Tuytelaars T (eds) Computer vision – ECCV. Lecture notes in computer science, vol 8691. Springer, Cham, pp 346–361
Liu S, Qi L, Qin H, Shi J, Jia J (2018) Path aggregation network for instance segmentation. In: 2018 IEEE/CVF conference on computer vision and pattern recognition (CVPR). pp 8759–8768
Misra D (2020) Mish: a self regularized non-monotonic neural activation function. In: 2020 British machine vision conference (BMVC). https://doi.org/10.48550/arXiv.1908.08681
Ramachandran P, Zoph B, Le Q (2018) Searching for activation functions. In: 2018 International conference on learning representations (ICLR) workshop. https://doi.org/10.48550/arXiv.1710.05941
Xu B, Wang N, Chen T, Li M (2015) Empirical evaluation of rectified activations in convolutional network. In: 2015 International conference on machine learning (ICML) workshop. https://doi.org/10.48550/arXiv.1505.00853
Woo S, Park J, Lee J, Kweon IS (2018) CBAM: convolutional block attention module. In: Ferrari V, Hebert M, Sminchisescu C, Weiss Y (eds) Computer vision – ECCV. Lecture notes in computer science, vol 11211. Springer, Cham, pp 3–19
Krueger KA, Dayan P (2009) Flexible shaping: How learning in small steps helps. Cognition 110(3):380–394
Shimada M, Sueur C (2018) Social play among juvenile wild Japanese macaques (Macaca fuscata) strengthens their social bonds. Am J Primatol 80(1):e22728
Shimada M, Uno T, Nakagawa N, Fujita S, Izawa K (2009) Case study of a one-sided attack by multiple troop members on a nontroop adolescent male and the death of Japanese macaques (Macaca fuscata). Aggress Behav 35(4):334–341
Bernardin K, Stiefelhagen R (2008) Evaluating multiple object tracking performance: The CLEAR MOT metrics. J Image Video Proc 2008: https://doi.org/10.1155/2008/246309
Acknowledgements
This work was partially supported by the joint project of Kyoto University and Toyota Motor Corporation, titled “Advanced Mathematical Science for Mobility Society”, JSPS KAKENHI Grant Numbers 17H05863, and 18K19821. The first author, R. R. Pineda, is supported by a postgraduate scholarship from the Engineering Research and Development for Technology (ERDT), Philippines.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
This work was submitted and accepted for the Journal Track of the joint symposium of the 28th International Symposium on Artificial Life and Robotics, the 8th International Symposium on BioComplexity, and the 6th International Symposium on Swarm Behavior and Bio-Inspired Robotics (Beppu, Oita, January 25–27, 2023).
About this article
Cite this article
Pineda, R.R., Kubo, T., Shimada, M. et al. Deep MAnTra: deep learning-based multi-animal tracking for Japanese macaques. Artif Life Robotics 28, 127–138 (2023). https://doi.org/10.1007/s10015-022-00837-9
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10015-022-00837-9