Improving wildlife tracking using 3D information

https://doi.org/10.1016/j.ecoinf.2021.101535Get rights and content

Highlights

  • A completely automated workflow for three-dimensional tracking of animals in wildlife monitoring.

  • Careful integration of image-based processing steps avoids complex three-dimensional deep learning networks size and cost-effective data annotation for training.

  • 3D tracking outperforms 2D tracking on sAMOTA with +2% and AMOTA with +9 %), as well as most other MOT metrics.

  • he 3D Kalman Filter outperforms the scene flow approach as a motion model with +4% for sAMOTA, +9% for MOTA.

Abstract

The monitoring of wildlife populations is of growing importance due to the worldwide endangerment of many species, global climate change, and land cover change. Wildlife monitoring by camera traps is an established and non-invasive standard approach to quantify species diversity, estimate occupancy and relative abundance and track animal behaviour in 24/7 documentation. We propose a novel wildlife-specific 3D multi-object tracking workflow using inexpensive stereo camera traps. By embedding carefully efficient 2D methods into the overall 3D workflow, we avoid, on the one hand, costly processing of complex 3D data structures, i.e., 3D point clouds but on the other hand outperform significantly typical 2D tracking approaches with our overall 3D workflow in terms of international established multi-object tracking metrics, i.e., with respect to the reliability and accuracy of the tracking results. The code is available at https://github.com/m-klasen/3d_wildlife-tracking

Introduction

Animal tracking helps biologists and ecologists to derive how individual animals and animal populations move within and migrate across local areas, oceans and continents and how animal abundances evolve in space and time. Camera traps are an appropriate technique for continuous animal tracking in automated an 24/7 documentation.

The simultaneous localization, trajectory prediction and distance estimation is particularly relevant for several biological and ecological studies like estimating animal densities and abundances (Howe et al., 2017), animal actions and movement (Schindler and Steinhage, 2021), animal group and herd interactions and movements (Herbert-Read, 2016), animal behaviour and communication (Ravignani, 2018), behavioural responses of animals to environments (Wong and Candolin, 2014) etc.

Conventional video tracking of animals using camera trapping is performed by locating animals in consecutive video frames using methods of computer vision. The positions of the located animals in each video frame are given in two dimensions (short: 2D), depicting the vertical and the horizontal position of an animal's visual appearance in each video frame. The distance or depth of an observed animal is not measured. Since multiple animals can be observed in each frame, animal tracking is an instance of the problem of Multiple Object Tracking (MOT), i.e. following the trajectories of different objects in an image sequence, usually video clips. Summing up, conventional video tracking of animals is an instance of the so-called 2D MOT approach. In this study, we propose a 3D MOT approach by using RGB-D video cameras to overcome typical shortcomings of conventional animal tracking like changes of shape appearance, occlusions and distance ambiguities. 3D MOT means tracking the three-dimensional movements of multiple animals in the observed locations.

Our 3D MOT approach to animal tracking shows three contributions:

  • We propose a workflow for 3D MOT in the specific wildlife monitoring domain by integrating sophisticated preprocessing steps to transform raw RGB-D video data to three-dimensional representations of the observed scene and animals, the so-called point clouds.

  • We evaluate our wildlife 3D MOT approach against typical 2D MOT approaches yielding an increased detection and tracking performance

  • In a detailed study, we analyze the performance of typically used components of 3D MOT integrated into our specific wildlife tracking system.

Section snippets

2D multi object tracking

In 2D MOT, the approaches can be subdivided into detection based tracking as well as online and offline tracking.

Dataset

The data is delivered by an Intel® RealSense™ D435 infrared stereo camera installed at one stationary location in the Lindenthal Zoo in Cologne (Haucke and Steinhage, 2021). In daytime mode, the camera employs the RGB-color channel. But most animal activities are captured at nighttime or dawn. Therefore, we focused in this study on the clips captured at nighttime and dawn using the infrared mode and an IR lamp to light the scene sufficiently. Fig. 1 depicts one frame taken at dawn showing

Methods

The overall workflow is depicted in Fig. 2. We start with an RGB-D video clip, i.e. a sequence of frames where each frame shows an image and depth map. But the depth maps are derived using a state-of-the-art approach to stereo analysis instead of using the on-board chip-set of the RealSense™ D435 stereo camera (cf. Section 4.1). To increase the temporal consistency of the frame-by-frame derived depth maps of video-clip, we apply conditional temporal median filtering (cf. Section 4.2). For each

Evaluation metrics

The evaluation employs all established 3D multi object tracking metrics that were developed for the KITTI dataset benchmark: the CLEAR MOT metrics Bernardin and Stiefelhagen (2008). Initially, the benchmark only supported 2D MOT, but Weng et al. (Weng et al., 2020) extended the approach to 3D and introduced new metrics to overcome intrinsic shortcomings of the original 2D MOT metrics.

Multi Object Tracking Accuracy (MOTA) and MOT Precision are original 2D MOT metric defined byMOTA=1FN+FP+IDSnum

Conclusion

We developed a novel wildlife-specific 3D multi object tracking (MOT) workflow using low budget stereo camera traps. That includes a workflow for the whole data processing pipeline from depth estimation, depth map processing to point cloud registration. By careful integration of 2D methods, we avoid complex 3D architectures of the deep learning network size and cost-effective data annotation for the training of the deep learning network. Despite this resource-saving implementation, the 3D

Declaration of Competing Interest

None

Acknowledgments

We gratefully acknowledge the German Federal Ministry of Education and Research (Bundesministerium für Bildung und Forschung (BMBF), Bonn, Germany (AMMOD - Automated Multisensor Stations for Monitoring of BioDiversity: FKZ 01LC1903B) for funding. We thank Thomas Ensch, Michael Gehlen and the entire team of the Lindenthaler Tierpark for their cooperation by hosting our experimental camera trap hardware on-site. We thank Timm Haucke for the technical work on the camera trap as well as providing

References (38)

  • S.Y. Chen

    Kalman filter for robot vision: a survey

    IEEE Trans. Ind. Electron.

    (2012)
  • K. Duan et al.

    Centernet: Keypoint triplets for object detection

  • M. Ester et al.

    A density-based algorithm for discovering clusters in large spatial databases with noise

  • M.A. Fischler et al.

    Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography

    Commun. ACM

    (1981)
  • T. Haucke et al.

    Exploiting Depth Information for Wildlife Monitoring

    (2021)
  • J.E. Herbert-Read

    Understanding how animal groups achieve coordinated movement

    J. Exp. Biol.

    (2016)
  • E.J. Howe et al.

    Distance sampling with camera traps

    Methods Ecol. Evol.

    (2017)
  • H.W. Kuhn

    The hungarian method for the assignment problem

    Naval Res. Logistics Quarter.

    (1955)
  • J. Luiten et al.

    Unovost: Unsupervised offline video object segmentation and tracking, in

  • Cited by (5)

    • Wildlife 3D multi-object tracking

      2022, Ecological Informatics
      Citation Excerpt :

      The performance evaluations of the 2D and 3D MOT approaches are based on established tracking metrics (Weng et al., 2020; Bernardin and Stiefelhagen, 2008). In this work, we propose to significantly improve an approach to 3D MOT for animal detection and tracking (Klasen and Steinhage, 2021) in which we use an RGB-D camera, specifically the commercially available Intel® RealSense™ D435 stereo sensor, to minimise the drawbacks of 2D MOT. It should be noted that impressive results for 2D MOT and 3D MOT were achieved in popular areas of autonomous vehicle detection and tracking.

    View full text