Authors:
Anoushka Banerjee
;
Dileep Aroor Dinesh
and
Arnav Bhavsar
Affiliation:
MANAS Lab, SCEE, Indian Institute of Technology Mandi, Kamand, H.P., India
Keyword(s):
Camera Trap, Empty Frames, Wildlife Detection, Domain Adaptation, Domain Generalization, Vision Transformer (ViT), Faster Region Based Convolution Networks (Faster R-CNN) and DEtection TRansformer (DETR).
Abstract:
Camera trap sequences are a treasure trove for wildlife data. Camera traps are susceptible to false triggers caused by ground heat flux and wind leading to empty frames. Empty frames are also generated if the animal moves out of the camera field of view in between the firing of a shot. The time lost in manually sieving the surfeit empty frames restraint the camera trap data usage. Camouflage, occlusion, motion blur, poor illumination, and a small region of interest not only make wildlife subject detection a difficult task for human experts but also add to the challenge of sifting empty frames from animal containing frames. Thus, in this work, we attempt to automate empty frame removal and animal detection in camera trap sequences using deep learning algorithms such as vision transformer (ViT), faster region based convolution networks (Faster R-CNN), and DEtection TRansformer (DETR). Each biodiversity hotspot has its characteristic seasonal variations and flora and fauna distribution
that juxtapose the need for domain generalization and adaptation in the leveraged deep learning algorithms. Therefore, we address the challenge of adapting our models to a few locations and generalising to the unseen location where training data is scarce.
(More)