Abstract:
In typical visual surveillance implementations, observations of scene objects are extracted as regions of moving pixels identified by pixel differencing based motion dete...Show MoreMetadata
Abstract:
In typical visual surveillance implementations, observations of scene objects are extracted as regions of moving pixels identified by pixel differencing based motion detection algorithms. These observations are tracked to establish their temporal coherence by updating a state vector describing the projected 2D width and height as well as image trajectory. Such an approach is particularly vulnerable to fragmentation and occlusion process as there is essentially no appearance model. The objective of this work is to develop simple but highly discriminatory models of scene objects which indirectly use the depth of the object to model its projected width and height. Rather than relying on a time-consuming, labour-intensive and expert-dependent calibration procedure to recover the full image to ground-plane homography, the system relies on a simple learning procedure involving watching several hundred objects entering, passing through and leaving the monitored view volume to recover the relationship between the projected 2D width and height of an object and its image position and visual motion.
Date of Conference: 22-25 September 2002
Date Added to IEEE Xplore: 10 December 2002
Print ISBN:0-7803-7622-6
Print ISSN: 1522-4880