Abstract
We address the problem of localizing and obtaining high-resolution footage of the people present in a scene. We propose a biologically-inspired solution combining pre-attentive, low-resolution sensing for detection with shiftable, high-resolution, attentive sensing for confirmation and further analysis.
The detection problem is made difficult by the unconstrained nature of realistic environments and human behaviour, and the low resolution of pre-attentive sensing. Analysis of human peripheral vision suggests a solution based on integration of relatively simple but complementary cues. We develop a Bayesian approach involving layered probabilistic modeling and spatial integration using a flexible norm that maximizes the statistical power of both dense and sparse cues. We compare the statistical power of several cues and demonstrate the advantage of cue integration. We evaluate the Bayesian cue integration method for human detection on a labelled surveillance database and find that it outperforms several competing methods based on conjunctive combinations of classifiers (e.g., Adaboost). We have developed a real-time version of our pre-attentive human activity sensor that generates saccadic targets for an attentive foveated vision system. Output from high-resolution attentive detection algorithms and gaze state parameters are fed back as statistical priors and combined with pre-attentive cues to determine saccadic behaviour. The result is a closed-loop system that fixates faces over a 130 deg field of view, allowing high-resolution capture of facial video over a large dynamic scene.
Similar content being viewed by others
References
Abramson, Y. and Freund, Y. 2005. Semi-automatic visual learning (Seville): a tutorial on active learning for visual object recognition, http://caor.ensmp.fr/∼abramson/sevilleCVPR/.
Bose, B. and Grimson, E. 2004. Improving object classification in far-field video. In Proc. CVPR, 2:181–188.
Buxton, H. and Gong, S.G. 1995. Visual surveillance in a dynamic and uncertain world. Artificial Intelligence, 78(1–2):431–459.
Cox, I.J. and Leonard, J.J. 1994. Modeling a dynamic environment using a bayesian multiple hypothesis approach, Artificial Intelligence, 66(2):311–344.
Elder, J.H., Dornaika, F., Hou, Y. and Goldstein, R. 2005. Attentive wide-field sensing for visual telepresence and surveillance. In Neurobiology of Attention, L. Itti, G. Rees, and J. Tsotsos, (Eds.) Academic Press/Elsevier, San Diego, CA.
Elder, J.H., Krupnik, A. and Johnston, L.A. 2003. Contour grouping with prior models. IEEE Transactions on Pattern Analysis and Machine Intelligence, 25(25):661–674.
Friedman, N. and Russel, S. 1997. Image segmentation in video sequences: a probabilistic approach. In Proc. UAI, 175–181.
Green, D.M. and Swets, J.A. 1966. Signal detection theory and psychophysics. Wiley, New York.
Greiffenhagen, M., Ramesh, V., Comaniciu, D. and Niemann, H. 2000. Statistical modeling and performance characterization of a real-time dual camera surveillance system. In Proc. CVPR, 335–342.
Haritaoglu, I., HArwood, D. and Davis, L.S. 2000. W4: Real-time surveillance of people and their activities, IEEE PAMI, 22(8):809–830.
Hayman, E. and Eklundh, J.O. 2002. Probabilistic and voting approaches to cue integration for figure-ground segmentation. In European Conference on Computer Vision, of Lecture Notes in Computer Science, 2352:469–486.
Hess, R.F. and Dakin, S.C. 1997. Absence of contour linking in peripheral vision. Nature, 390:602–604. Letters to Nature.
Ikeda, H., Blake, R. and Watanabe, K. 2005. Eccentric perception of biological motion is unscalably poor. Vision Research, 45:1935–1943.
Isard, M. and Blake, A. 1998. Condensation: conditional density propagation for visual tracking. International Journal of Computer Vision, 29(1):5–28.
Itti, L. 2005. Quantifying the contribution of low-level saliency to human eye movements in dynamic scenes, Visual Cognition, 12(6):1093–1123.
Izenman, A.J. 1991. Recent developments in nonparametric density estimation, Journal of the American Statistical Association, 86(413):205–224.
Johnston, A. and Wright, M.J. 1985. Lower thresholds of motion for gratings as a function of eccentricity and contrast. Vision Research, 25(2):179–185.
Jones, M.J. and Rehg, J.M. 1999. Statistical color models with application to skin detection. In Proc. CVPR, 274–280.
Kruppa, H., Santana, M.C. and Schiele, B. 2003. Fast and robust face finding via local context. In Proc. VS-PETS, 157–164.
Lienhart, R. and Maydt, J. 2002. An extended set of Haar-like features for rapid object detection. In IEEE International Conference on Image Processing, 900–903.
Marchesotti, L., Marcenaro, L. and Regazzoni, C. 2003. Dual camera system for face detection in unconstrained environments. In Proc. ICIP, 1:681–684.
Miller, M.I., Grenander, U., O’Sullivan, J.A. and Synder, D.L. 1997. Automatic target recognition organized via jump-diffusion algorithms. IEEE Transactions on Image Processing, 6(1):157–174.
Nair, V. and Clark, J.J. 2004. An unsupervised, online learning framework for moving object detection. In Proc. CVPR, 2:317–324.
Parkhurst, D., Law, K. and Niebur, E. 2002. Modeling the role of salience in the allocation of overt visual attention. Vision Research, 42:107–123.
Rovamo, J. and Iivanainen, A. 1991. Detection of chromatic deviations from white across the human visual field. Vision Research, 31(12):2227–2234.
Scassellati, B. 1998. Eye finding via face detection for a foveated active vision system. In AAAI/IAAI, 969–976.
Schneiderman, H. 2004. Feature-centric evaluation for efficient cascaded object detection. In Proc. CVPR, 2:29–36.
Schneiderman, H. and Kanade, T. 2004. Object detection using the statistic of parts. International Journal of Computer Vision, 56(3):151–177.
Sherrah, J. and Gong, S. 2001, Continuous global evidence-based Bayesian modality fusion for simultaneous tracking of multiple objects. In Proceedings of the International Conference on Computer Vision, II:42–49.
Sidenbladh, H. and Black, M.J. 2003. Learning the statistics of people in images and video. International Journal of Computer Vision, 54(1/2/3):183–209.
Spengler, M. and Schiele, B. 2001. Towards robust multi-cue integration for visual tracking. In International Conference on Vision Systems, Berlin, 2001, vol. 2095 of Lecture Notes in Computer Science, pp. 93–106, Springer-Verlag.
Sullivan, J., Blake, A., Isard, M. and MacCormick, J. 2001. Bayesian object localisation in images. International Journal of Computer Vision, 44(2):111–135.
Triesch, J. and von der Malsburg, C. 2001. Democratic integration: self-organized integration of adaptive cues. Neural Computation, 13:2049–2074.
Triesch, J. and von der Malsburg, C. 2001. A system for person-independent hand posture recognition against complex backgrounds. IEEE Transactions on Pattern Analysis and Machine Intelligence, 23(12):1449–1453.
Toyama, K. and Horvitz, E. 2000. Bayesian modality fusion: probabilistic integration of multiple vision algorithms for head tracking. In Fourth Asian Conference on Computer Vision.
Velisavljevic, L. and Elder, J.H. 2002. What do we see in a glance? [abstract]. Journal of Vision, 2(7):493.
Velisavljevic, L. and Elder, J.H. 2003. Eccentricity effects in the rapid visual encoding of natural images [abstract], Journal of Vision, 3(9):647a.
Viola, P. and Jones, M.J. 2001. Rapid object detection using a boosted cascade of simple features, In Proc. CVPR, 1:511–518.
Viola, P., Jones, M.J. and Snow, D. 2003. Detecting pedestrians using patterns of motion and appearance. In Proc. ICCV, 2:734–741.
Xiong, Q. and Jaynes, C.O. 2003. Mugshot database acquisition in video surveillance networks using incremental auto-clustering quality measures. In Proc. AVSS, Los Alamos, CA, IEEE, Computer Society, 191–198.
Zhao, T. and Nevatia, R. 2004. Tracking multiple humans in complex situations. IEEE PAMI, 26(9):1208–1221.
Author information
Authors and Affiliations
Rights and permissions
About this article
Cite this article
Elder, J.H., Prince, S.J.D., Hou, Y. et al. Pre-Attentive and Attentive Detection of Humans in Wide-Field Scenes. Int J Comput Vision 72, 47–66 (2007). https://doi.org/10.1007/s11263-006-8892-7
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11263-006-8892-7