Abstract
With the development of computer vision technology, many advanced computer vision methods have been successfully applied to animal detection, tracking, recognition and behavior analysis, which is of great help to ecological protection, biodiversity conservation and environmental protection. As existing datasets applied to target tracking contain various kinds of common objects, but rarely focus on wild animals, this paper proposes the first benchmark, named Wild Animal Tracking Benchmark (WATB), to encourage further progress of research and applications of visual object tracking. WATB contains more than 203,000 frames and 206 video sequences, and covers different kinds of animals from land, sea and sky. The average length of the videos is over 980 frames. Each video is manually labelled with thirteen challenge attributes including illumination variation, rotation, deformation, and so on. In the dataset, all frames are annotated with axis-aligned bounding boxes. To reveal the performance of these existing tracking algorithms and provide baseline results for future research on wild animal tracking, we benchmark a total of 38 state-of-the-art trackers and rank them according to tracking accuracy. Evaluation results demonstrate that the trackers based on deep networks perform much better than other trackers like correlation filters. Another finding on the basis of the evaluation results is that wild animals tracking is still a big challenge in computer vision community. The benchmark WATB and evaluation results are released on the project website https://w-1995.github.io/.
Similar content being viewed by others
Availability of Data and Materials
The datasets generated during and/or analysed during the current study are available in the project website: https://w-1995.github.io/.
Notes
For the abbreviations, please refer to Table 7 in the supplementary part.
References
Bertinetto, L., Valmadre, J., Golodetz, S., Miksik, O., & Torr, P.H. (2016). Staple: Complementary learners for real-time tracking. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 1401–1409. IEEE, Las Vegas, USA. https://doi.org/10.1109/CVPR.2016.156
Bertinetto, L., Valmadre, J., Henriques, J.F., Vedaldi, A., & Torr, P.H. (2016). Fully-convolutional siamese networks for object tracking. In: European Conference on Computer Vision, pp. 850–865. Springer, Amsterdam, Netherlands. https://doi.org/10.1007/978-3-319-48881-3_56
Bhat, G., Danelljan, M., Gool, L.V., & Timofte, R. (2019). Learning discriminative model prediction for tracking. In: IEEE/CVF International Conference on Computer Vision (ICCV), Seoul, Korea, pp. 6182–6191. https://doi.org/10.1109/ICCV.2019.00628
Bolme, D.S., Beveridge, J.R., Draper, B.A., & Lui, Y.M. (2010). Visual object tracking using adaptive correlation filters. In: International Conference on Computer Vision and Pattern Recognition, pp. 2544–2550. IEEE, San Francisco, USA. https://doi.org/10.1109/CVPR.2010.5539960
Cao, Z., Fu, C., Ye, J., Li, B., & Li, Y. (2021). Hift: Hierarchical feature transformer for aerial tracking. In: IEEE/CVF Conference on Computer Vision, pp. 15457–15466. IEEE/CVF, Montreal, QC, Canada. https://doi.org/10.1109/ICCV48922.2021.01517
Chen, X., Yan, B., Zhu, J., Wang, D., Yang, X., & Lu, H. (2021). Transformer tracking. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Nashville, TN, USA, pp. 8126–8135. https://doi.org/10.1109/CVPR46437.2021.00803
Chen, Z., Zhong, B., Li, G., Zhang, S., & Ji, R. (2020). Siamese box adaptive network for visual tracking. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6668–6677. IEEE/CVF, Seattle, WA, USA. https://doi.org/10.1109/CVPR42600.2020.00670
Dai, K., Wang, D., Lu, H., Sun, C., & Li, J. (2019). Visual tracking via adaptive spatially-regularized correlation filters. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 4670–4679. IEEE/CVF, Salt Lake City, USA. https://doi.org/10.1109/CVPR.2019.00480
Danelljan, M., Bhat, G., Khan, F.S., & Felsberg, M. (2019). Atom: Accurate tracking by overlap maximization. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, pp. 4660–4669. https://doi.org/10.1109/CVPR.2019.00479
Danelljan, M., Bhat, G., Shahbaz Khan, F., & Felsberg, M. (2017). Eco: Efficient convolution operators for tracking. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6638–6646. IEEE, Honolulu, Hawaii. https://doi.org/10.1109/CVPR.2017.733
Danelljan, M., Häger, G., Khan, F., & Felsberg, M. (2014). Accurate scale estimation for robust visual tracking. In: British Machine Vision Conference. Bmva Press, Nottingham, England. https://doi.org/10.5244/C.28.65
Danelljan, M., Hager, G., Shahbaz Khan, F., & Felsberg, M. (2015). Learning spatially regularized correlation filters for visual tracking. In: IEEE International Conference on Computer Vision, pp. 4310–4318. IEEE, Santiago, Chile. https://doi.org/10.1109/ICCV.2015.490
Danelljan, M., Häger, G., Khan, F. S., & Felsberg, M. (2016). Discriminative scale space tracking. IEEE Transactions on Pattern Analysis and Machine Intelligence, 39(8), 1561–1575. https://doi.org/10.1109/TPAMI.2016.2609928.
Fan, H., Lin, L., Yang, F., Chu, P., Deng, G., Yu, S., Bai, H., Xu, Y., Liao, C., & Ling, H. (2019). Lasot: A high-quality benchmark for large-scale single object tracking. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 5374–5383. IEEE/CVF, Long Beach, USA. https://doi.org/10.1109/CVPR.2019.00552
Fan, H., Miththanthaya, H.A., Harshit, Rajan, S.R., Liu, X., Zhou, Z., Lin, Y., & Ling, H. (2021). Transparent object tracking benchmark. In: International Conference on Computer Vision, pp. 10734–10743. IEEE/CVF, Nashville, TN, USA. https://doi.org/10.1109/ICCV48922.2021.01056
Feng, W., Han, R., Guo, Q., Zhu, J., & Wang, S. (2019). Dynamic saliency-aware regularization for correlation filter-based object tracking. IEEE Transactions on Image Processing, 28(7), 3232–3245. https://doi.org/10.1109/TIP.2019.2895411.
Fu, Z., Liu, Q., Fu, Z., & Wang, Y. (2021). Stmtrack: Template-free visual tracking with space-time memory networks. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Nashville, TN, USA, pp. 13774–13783. https://doi.org/10.1109/CVPR46437.2021.01356
Fukunaga, T., Kubota, S., Oda, S., & Iwasaki, W. (2015). Grouptracker: Video tracking system for multiple animals under severe occlusion. Computational Biology and Chemistry, 57, 39–45. https://doi.org/10.1016/j.compbiolchem.2015.02.006.
Fu, C., Xu, J., Lin, F., Guo, F., Liu, T., & Zhang, Z. (2020). Object saliency-aware dual regularized correlation filter for real-time aerial tracking. IEEE Transactions on Geoscience and Remote Sensing, 58(12), 8940–8951. https://doi.org/10.1109/TGRS.2020.2992301.
Galoogahi, H.K., Fagg, A., Huang, C., Ramanan, D., & Lucey, S. (2017). Need for speed: A benchmark for higher frame rate object tracking. In: International Conference on Computer Vision, pp. 1134–1143. IEEE, Venice, Italy https://doi.org/10.1109/ICCV.2017.128
Guo, D., Shao, Y., Cui, Y., Wang, Z., Zhang, L., & Shen, C. (2021). Graph attention tracking. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Nashville, TN, USA, pp. 9543–9552. https://doi.org/10.1109/CVPR46437.2021.00942
Guo, D., Wang, J., Cui, Y., Wang, Z., & Chen, S. (2020). Siamcar: Siamese fully convolutional classification and regression for visual tracking. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6269–6277. IEEE/CVF, Seattle, WA, USA. https://doi.org/10.1109/CVPR42600.2020.00630
Haalck, L., Mangan, M., Webb, B., & Risse, B. (2020). Towards image-based animal tracking in natural environments using a freely moving camera. Journal of Neuroscience Methods, 330, 108455. https://doi.org/10.1016/j.jneumeth.2019.108455.
He, Z., Fan, Y., Zhuang, J., Dong, Y., & Bai, H. (2017). Correlation filters with weighted convolution responses. In: IEEE International Conference on Computer Vision Workshop, pp. 1992–2000. IEEE, Venice, Italy. https://doi.org/10.1109/ICCVW.2017.233
Henriques, J.F., Caseiro, R., Martins, P., & Batista, J. (2012). Exploiting the circulant structure of tracking-by-detection with kernels. In: European Conference on Computer Vision, pp. 702–715. Springer, Florence, Italy. https://doi.org/10.1007/978-3-642-33765-9_50
Henriques, J. F., Caseiro, R., Martins, P., & Batista, J. (2014). High-speed tracking with kernelized correlation filters. IEEE Transactions on Pattern Analysis and Machine Intelligence, 37(3), 583–596. https://doi.org/10.1109/TPAMI.2014.2345390.
Huang, Z., Fu, C., Li, Y., Lin, F., & Lu, P. (2019). Learning aberrance repressed correlation filters for real-time uav tracking. In: IEEE/CVF International Conference on Computer Vision, pp. 2891–2900. IEEE/CVF, Seoul, Korea(south). https://doi.org/10.1109/ICCV.2019.00298
Huang, L., Zhao, X., & Huang, K. (2021). Got-10k: A large high-diversity benchmark for generic object tracking in the wild. IEEE Transactions on Pattern Analysis and Machine Intelligence, 43(5), 1562–1577. https://doi.org/10.1109/TPAMI.2019.2957464.
Kiani Galoogahi, H., Fagg, A., & Lucey, S. (2017). Learning background-aware correlation filters for visual tracking. In: IEEE International Conference on Computer Vision, pp. 1135–1143. IEEE, Venice, Italy. https://doi.org/10.1109/ICCV.2017.129
Li, Y., & Zhu, J. (2014). A scale adaptive kernel correlation filter tracker with feature integration. In: European Conference on Computer Vision Workshop, pp. 254–265. Springer, Zurich, Switzerland. https://doi.org/10.1007/978-3-319-16181-5_18
Li, Y., Fu, C., Ding, F., Huang, Z., & Lu, G. (2020). Autotrack: Towards high-performance visual tracking for uav with automatic spatio-temporal regularization. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 11923–11932. IEEE/CVF, Seattle, WA, USA. https://doi.org/10.1109/CVPR42600.2020.01194
Li, F., Tian, C., Zuo, W., Zhang, L., & Yang, M.-H. (2018). Learning spatial-temporal regularized correlation filters for visual tracking. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 4904–4913. IEEE, Salt Lake City, USA. https://doi.org/10.1109/CVPR.2018.00515
Li, B., Wu, W., Wang, Q., Zhang, F., Xing, J., & Yan, J. (2019). Siamrpn++: Evolution of siamese visual tracking with very deep networks. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 4282–4291. IEEE/CVF, Long Beach, USA. https://doi.org/10.1109/CVPR.2019.00441
Li, B., Yan, J., Wu, W., Zhu, Z., & Hu, X. (2018). High performance visual tracking with siamese region proposal network. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 8971–8980. IEEE, Salt Lake City, USA. https://doi.org/10.1109/CVPR.2018.00935
Liang, P., Blasch, E., & Ling, H. (2015). Encoding color information for visual tracking: Algorithms and benchmark. IEEE Transactions on Image Processing, 24(12), 5630–5644. https://doi.org/10.1109/TIP.2015.2482905.
Li, C., Liang, X., Lu, Y., Zhao, N., & Tang, J. (2019). Rgb-t object tracking: Benchmark and baseline. Pattern Recognition, 96, 106977. https://doi.org/10.1016/j.patcog.2019.106977.
Lin, Y., Cheng, S., Shen, J., & Pantic, M. (2019). Mobiface: a novel dataset for mobile face tracking in the wild. In: IEEE International Conference on Automatic Face and Gesture Recognition, pp. 1–8. https://doi.org/10.1109/FG.2019.8756581
Liu, S., Liu, D., Srivastava, G., Polap, D., & Woźniak, M. (2021). Overview of correlation filter based algorithms in object tracking. Complex and Intelligent Systems, 7, 1895–1917. https://doi.org/10.1007/s40747-020-00161-4.
Li, P., Wang, D., Wang, L., & Lu, H. (2018). Deep visual tracking: Review and experimental comparison. Pattern Recognition, 76, 323–338. https://doi.org/10.1016/j.patcog.2017.11.007.
Li, F., Wu, X., Zuo, W., Zhang, D., & Zheng, L. (2020). Remove cosine window from correlation filter-based visual trackers: When and how. IEEE Transactions on Image Processing, 29, 7045–7060. https://doi.org/10.1109/TIP.2020.2997521.
Lopez-Marcano, S., Jinks, E. L., Buelow, C. A., Brown, C. J., Wang, D., Kusy, B., et al. (2021). Automatic detection of fish and tracking of movement for ecology. Ecology and Evolution, 11, 8254–8263. https://doi.org/10.1002/ece3.7656.
Lu, H., & Wang, D. (2019). Online Visual Tracking. Singapore: Springer.
Ma, C., Huang, J.-B., Yang, X., & Yang, M.-H. (2015). Hierarchical convolutional features for visual tracking. In: IEEE International Conference on Computer Vision, pp. 3074–3082. IEEE, Santiago, Chile https://doi.org/10.1109/ICCV.2015.352
Mathis, A., Mamidanna, P., Cury, K.M., Abe, T., N., M.V., W., M.M., & Bethge, M. (2018). Deeplabcut: markerless pose estimation of user-defined body parts with deep learning. Nature Neuroscience 21, 1281–1289. https://doi.org/10.1038/s41593-018-0209-y
Mueller, M., Smith, N., & Ghanem, B. (2016). A benchmark and simulator for uav tracking. In: European Conference on Computer Vision, pp. 445–461. Springer, Amsterdam, The Netherlands. https://doi.org/10.1007/978-3-319-46448-0_27
Nam, H., & Han, B. (2016). Learning multi-domain convolutional neural networks for visual tracking. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 4293–4302. IEEE, Las Vegas, USA. https://doi.org/10.1109/CVPR.2016.465
Norouzzadeh, M. S., Nguyen, A., Kosmala, M., Swanson, A., & Palmer, M. S. (2018). Automatically identifying, counting, and describing wild animals in camera-trap images with deep learning. Proceedings of the National Academy of Sciences of the United States of America, 115(25), 5716–5725. https://doi.org/10.1073/pnas.1719367115.
Ravoor, P. C., & Sudarshan, T. S. B. (2020). Deep learning methods for multi-species animal re-identification and tracking-a survey. Computer Science Review, 38, 100289. https://doi.org/10.1016/j.cosrev.2020.100289.
Risse, B., Mangan, M., Del Pero, L., & Webb, B. (2017). Visual tracking of small animals in cluttered natural environments using a freely moving camera. In: 2017 IEEE International Conference on Computer Vision Workshops, pp. 2840–2849. IEEE/CVF, Venice, Italy. https://doi.org/10.1109/ICCVW.2017.335
Risse, B., Mangan, M., Del Pero, K., & Webb, B. (2017). Visual tracking of small animals in cluttered natural environments using a freely moving camera. In: International Conference on Computer Vision Workshop, pp. 2840–2849. https://doi.org/10.1016/j.anbehav.2016.12.005
Shen, J., Zafeiriou, S., Chrysos, G.G., Kossaifi, J., Tziiropoulos, G., & Pantic, M. (2015). The first facial landmark tracking in-the-wild challenge: benchmark and results. In: International Conference on Computer Vision Workshop, pp. 1003–1011. IEEE, Santiago, Chile. https://doi.org/10.1109/ICCVW.2015.132
Tuia, D., Kellenberger, B., Beery, S., Costelloe, B. .R., Zuffi, S., Risse, B., et al. (2022). Perspectives in machine learning for wildlife conservation. Nature Communication, 13(792), 1–15. https://doi.org/10.1038/s41467-022-27980-y.
Valletta, J. J., Torney, C., Kings, M., Thornton, A., & Madden, J. (2017). Applications of machine learning in animal behaviour studies. Animal Behaviour, 124, 203–220. https://doi.org/10.1016/j.anbehav.2016.12.005.
Valmadre, J., Bertinetto, L., Henriques, J.F., Tao, R., Vedaldi, A., Smeulders, A.W.M., Torr, P.H.S., & Gavves, E. (2018). Long-term tracking in the wild: A benchmark. In: European Conference on Computer Vision, pp. 692–707. Springer, Munich, Germany. https://doi.org/10.1007/978-3-030-01219-9_41
Van der Zande, L. E., Guzhva, O., & Rodenburg, T. B. (2021). Individual detection and tracking of group housed pigs in their home pen using computer vision. Frontiers in Animal Science, 2, 669312. https://doi.org/10.3389/fanim.2021.669312.
Wang, L., Ouyang, W., Wang, X., & Lu, H. (2015). Visual tracking with fully convolutional networks. In: IEEE International Conference on Computer Vision, pp. 3119–3127. IEEE, Santiago, Chile. https://doi.org/10.1109/ICCV.2015.357
Wang, Q., Zhang, L., Bertinetto, L., Hu, W., & Torr, P.H.S. (2019). Fast online object tracking and segmentation: A unifying approach. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, pp. 1328–1338. https://doi.org/10.1109/CVPR.2019.00142
Wang, F., Wang, C., Yin, S., He, J., Sun, F., & Zhang, J. (2022). Amtset: A benchmark for abrupt motion tracking. Multimedia Tools and Applications, 81, 4711–4734. https://doi.org/10.1007/s11042-021-10947-4.
Weinstein, B. G. (2018). A computer vision for animal ecology. Journal of Animal Ecology, 87, 533–545. https://doi.org/10.1111/1365-2656.12780.
Wu, Y., Lim, J., & Yang, M.-H. (2013). Online object tracking: A benchmark. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 2411–2418. IEEE, Portland, USA. https://doi.org/10.1109/CVPR.2013.312
Wu, Y., Lim, J., & Yang, M. H. (2015). Object tracking benchmark. IEEE Transactions on Pattern Analysis and Machine Intelligence, 37(9), 1834–1848. https://doi.org/10.1109/TPAMI.2014.2388226.
Xiong, F., Zhou, J., & Qian, Y. (2020). Material based object tracking in hyperspectral videos. IEEE Transactions on Image Processing, 29, 3719–3733. https://doi.org/10.1109/TIP.2020.2965302.
Xu, T., Feng, Z., Wu, X., & Kittler, J. (2019). Joint group feature selection and discriminative filter learning for robust visual object tracking. In: IEEE/CVF International Conference on Computer Vision, pp. 7950–7960. IEEE/CVF, Seoul, Korea. https://doi.org/10.1109/ICCV.2019.00804
Xu, T., Feng, Z.-H., Wu, X.-J., & Kittler, J. (2019). Learning adaptive discriminative correlation filters via temporal consistency preserving spatial feature selection for robust visual object tracking. IEEE Transactions on Image Processing, 28(11), 5596–5609. https://doi.org/10.1109/TIP.2019.2919201.
Xu, T., Feng, Z., Wu, X., & Kittler, J. (2021). Adaptive channel selection for robust visual object tracking with discriminative correlation filters. International Journal of Computer Vision, 129, 1359–1375. https://doi.org/10.1007/s11263-021-01435-1.
Yan, B., Peng, H., Fu, J., Wang, D., & Lu, H. (2021). Learning spatio-temporal transformer for visual tracking. In: IEEE/CVF International Conference on Computer Vision, Montreal, QC, Canada, pp. 10448–10457. https://doi.org/10.1109/ICCV48922.2021.01028
Ye, J., Fu, C., Lin, F., Ding, F., An, S., & Lu, G. (2022). Multi-regularized correlation filter for uav tracking and self-localization. IEEE Transactions on Industrial Electronics, 69(6), 6004–6014. https://doi.org/10.1109/TIE.2021.3088366.
Yu, Y.a. (2020). Deformable siamese attention networks for visual object tracking. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6728–6737. IEEE/CVF, Seattle, WA, USA. https://doi.org/10.1109/CVPR42600.2020.00676
Zheng, G., Fu, C., Ye, J., Lin, F., & Ding, F. (2021). Mutation sensitive correlation filter for real-time uav tracking with adaptive hybrid label. In: IEEE International Conference on Robotics and Automation, pp. 503–509. IEEE, Xi’an, China. https://doi.org/10.1109/ICRA48506.2021.9561931
Zuo, W., Wu, X., Lin, L., Zhang, L., & Yang, M.-H. (2018). Learning support correlation filters for visual tracking. IEEE Transactions on Pattern Analysis and Machine Intelligence, 41(5), 1158–1172. https://doi.org/10.1109/TPAMI.2018.2829180.
Acknowledgements
This work was supported by the National Natural Science Foundation of China under Grants 61972068 and 61976042, Innovative Talents Program for LiaoningUniversities under Grant LR2019020 and the Liaoning Revitalization Talents Program under Grant XLYC2007023.
Author information
Authors and Affiliations
Contributions
FW and FS conceived this study. FW wrote the initial manuscript, and FS reviewed and edited it. The other four authors took part in the construction of WTAB. PC and FL are responsible for tracker evaluation. XW is responsible for building the project website.
Corresponding author
Ethics declarations
Conflict of interest
All authors certify that they have no affiliations with or involvement in any organization or entity with any financial interest or non-financial interest in the subject matter or materials discussed in this manuscript.
Additional information
Communicated by Hyun Soo Park.
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Wang, F., Cao, P., Li, F. et al. WATB: Wild Animal Tracking Benchmark. Int J Comput Vis 131, 899–917 (2023). https://doi.org/10.1007/s11263-022-01732-3
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11263-022-01732-3