Improving wildlife tracking using 3D information
Introduction
Animal tracking helps biologists and ecologists to derive how individual animals and animal populations move within and migrate across local areas, oceans and continents and how animal abundances evolve in space and time. Camera traps are an appropriate technique for continuous animal tracking in automated an 24/7 documentation.
The simultaneous localization, trajectory prediction and distance estimation is particularly relevant for several biological and ecological studies like estimating animal densities and abundances (Howe et al., 2017), animal actions and movement (Schindler and Steinhage, 2021), animal group and herd interactions and movements (Herbert-Read, 2016), animal behaviour and communication (Ravignani, 2018), behavioural responses of animals to environments (Wong and Candolin, 2014) etc.
Conventional video tracking of animals using camera trapping is performed by locating animals in consecutive video frames using methods of computer vision. The positions of the located animals in each video frame are given in two dimensions (short: 2D), depicting the vertical and the horizontal position of an animal's visual appearance in each video frame. The distance or depth of an observed animal is not measured. Since multiple animals can be observed in each frame, animal tracking is an instance of the problem of Multiple Object Tracking (MOT), i.e. following the trajectories of different objects in an image sequence, usually video clips. Summing up, conventional video tracking of animals is an instance of the so-called 2D MOT approach. In this study, we propose a 3D MOT approach by using RGB-D video cameras to overcome typical shortcomings of conventional animal tracking like changes of shape appearance, occlusions and distance ambiguities. 3D MOT means tracking the three-dimensional movements of multiple animals in the observed locations.
Our 3D MOT approach to animal tracking shows three contributions:
- •
We propose a workflow for 3D MOT in the specific wildlife monitoring domain by integrating sophisticated preprocessing steps to transform raw RGB-D video data to three-dimensional representations of the observed scene and animals, the so-called point clouds.
- •
We evaluate our wildlife 3D MOT approach against typical 2D MOT approaches yielding an increased detection and tracking performance
- •
In a detailed study, we analyze the performance of typically used components of 3D MOT integrated into our specific wildlife tracking system.
Section snippets
2D multi object tracking
In 2D MOT, the approaches can be subdivided into detection based tracking as well as online and offline tracking.
Dataset
The data is delivered by an Intel® RealSense™ D435 infrared stereo camera installed at one stationary location in the Lindenthal Zoo in Cologne (Haucke and Steinhage, 2021). In daytime mode, the camera employs the RGB-color channel. But most animal activities are captured at nighttime or dawn. Therefore, we focused in this study on the clips captured at nighttime and dawn using the infrared mode and an IR lamp to light the scene sufficiently. Fig. 1 depicts one frame taken at dawn showing
Methods
The overall workflow is depicted in Fig. 2. We start with an RGB-D video clip, i.e. a sequence of frames where each frame shows an image and depth map. But the depth maps are derived using a state-of-the-art approach to stereo analysis instead of using the on-board chip-set of the RealSense™ D435 stereo camera (cf. Section 4.1). To increase the temporal consistency of the frame-by-frame derived depth maps of video-clip, we apply conditional temporal median filtering (cf. Section 4.2). For each
Evaluation metrics
The evaluation employs all established 3D multi object tracking metrics that were developed for the KITTI dataset benchmark: the CLEAR MOT metrics Bernardin and Stiefelhagen (2008). Initially, the benchmark only supported 2D MOT, but Weng et al. (Weng et al., 2020) extended the approach to 3D and introduced new metrics to overcome intrinsic shortcomings of the original 2D MOT metrics.
Multi Object Tracking Accuracy (MOTA) and MOT Precision are original 2D MOT metric defined by
Conclusion
We developed a novel wildlife-specific 3D multi object tracking (MOT) workflow using low budget stereo camera traps. That includes a workflow for the whole data processing pipeline from depth estimation, depth map processing to point cloud registration. By careful integration of 2D methods, we avoid complex 3D architectures of the deep learning network size and cost-effective data annotation for the training of the deep learning network. Despite this resource-saving implementation, the 3D
Declaration of Competing Interest
None
Acknowledgments
We gratefully acknowledge the German Federal Ministry of Education and Research (Bundesministerium für Bildung und Forschung (BMBF), Bonn, Germany (AMMOD - Automated Multisensor Stations for Monitoring of BioDiversity: FKZ 01LC1903B) for funding. We thank Thomas Ensch, Michael Gehlen and the entire team of the Lindenthaler Tierpark for their cooperation by hosting our experimental camera trap hardware on-site. We thank Timm Haucke for the technical work on the camera trap as well as providing
References (38)
- et al.
Flownet3d: Learning scene flow in 3d point clouds
- et al.
Multiple object tracking: a literature review
Artif. Intell.
(2021) - et al.
Cooperative parallel particle filters for online model selection and applications to urban mobility
Digital Signal Process.
(2017) - et al.
Identification of animals and recognition of their actions in wildlife videos using deep learning techniques
Ecol. Inform.
(2021) - et al.
Optical flow and scene flow estimation: a survey
Pattern Recogn.
(2021) - et al.
Tracking without bells and whistles
- et al.
Evaluating multiple object tracking performance: the clear mot metrics
EURASIP J. Image Video Process.
(2008) - et al.
Simple online and realtime tracking
- et al.
End-to-end object detection with transformers
- et al.
Deft: Detection Embeddings for Tracking
(2021)
Kalman filter for robot vision: a survey
IEEE Trans. Ind. Electron.
Centernet: Keypoint triplets for object detection
A density-based algorithm for discovering clusters in large spatial databases with noise
Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography
Commun. ACM
Exploiting Depth Information for Wildlife Monitoring
Understanding how animal groups achieve coordinated movement
J. Exp. Biol.
Distance sampling with camera traps
Methods Ecol. Evol.
The hungarian method for the assignment problem
Naval Res. Logistics Quarter.
Unovost: Unsupervised offline video object segmentation and tracking, in
Cited by (5)
Construction of 3D landscape indexes based on oblique photogrammetry and its application for islands
2023, Ecological InformaticsWildlife 3D multi-object tracking
2022, Ecological InformaticsCitation Excerpt :The performance evaluations of the 2D and 3D MOT approaches are based on established tracking metrics (Weng et al., 2020; Bernardin and Stiefelhagen, 2008). In this work, we propose to significantly improve an approach to 3D MOT for animal detection and tracking (Klasen and Steinhage, 2021) in which we use an RGB-D camera, specifically the commercially available Intel® RealSense™ D435 stereo sensor, to minimise the drawbacks of 2D MOT. It should be noted that impressive results for 2D MOT and 3D MOT were achieved in popular areas of autonomous vehicle detection and tracking.
Chasing the cheetah: how field biomechanics has evolved to keep up with the fastest land animal
2023, Journal of Experimental Biology