Skip to main content

Advertisement

Log in

Pre-Attentive and Attentive Detection of Humans in Wide-Field Scenes

  • Published:
International Journal of Computer Vision Aims and scope Submit manuscript

Abstract

We address the problem of localizing and obtaining high-resolution footage of the people present in a scene. We propose a biologically-inspired solution combining pre-attentive, low-resolution sensing for detection with shiftable, high-resolution, attentive sensing for confirmation and further analysis.

The detection problem is made difficult by the unconstrained nature of realistic environments and human behaviour, and the low resolution of pre-attentive sensing. Analysis of human peripheral vision suggests a solution based on integration of relatively simple but complementary cues. We develop a Bayesian approach involving layered probabilistic modeling and spatial integration using a flexible norm that maximizes the statistical power of both dense and sparse cues. We compare the statistical power of several cues and demonstrate the advantage of cue integration. We evaluate the Bayesian cue integration method for human detection on a labelled surveillance database and find that it outperforms several competing methods based on conjunctive combinations of classifiers (e.g., Adaboost). We have developed a real-time version of our pre-attentive human activity sensor that generates saccadic targets for an attentive foveated vision system. Output from high-resolution attentive detection algorithms and gaze state parameters are fed back as statistical priors and combined with pre-attentive cues to determine saccadic behaviour. The result is a closed-loop system that fixates faces over a 130 deg field of view, allowing high-resolution capture of facial video over a large dynamic scene.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  • Abramson, Y. and Freund, Y. 2005. Semi-automatic visual learning (Seville): a tutorial on active learning for visual object recognition, http://caor.ensmp.fr/∼abramson/sevilleCVPR/.

  • Bose, B. and Grimson, E. 2004. Improving object classification in far-field video. In Proc. CVPR, 2:181–188.

  • Buxton, H. and Gong, S.G. 1995. Visual surveillance in a dynamic and uncertain world. Artificial Intelligence, 78(1–2):431–459.

    Article  Google Scholar 

  • Cox, I.J. and Leonard, J.J. 1994. Modeling a dynamic environment using a bayesian multiple hypothesis approach, Artificial Intelligence, 66(2):311–344.

    Google Scholar 

  • Elder, J.H., Dornaika, F., Hou, Y. and Goldstein, R. 2005. Attentive wide-field sensing for visual telepresence and surveillance. In Neurobiology of Attention, L. Itti, G. Rees, and J. Tsotsos, (Eds.) Academic Press/Elsevier, San Diego, CA.

    Google Scholar 

  • Elder, J.H., Krupnik, A. and Johnston, L.A. 2003. Contour grouping with prior models. IEEE Transactions on Pattern Analysis and Machine Intelligence, 25(25):661–674.

    Article  Google Scholar 

  • Friedman, N. and Russel, S. 1997. Image segmentation in video sequences: a probabilistic approach. In Proc. UAI, 175–181.

  • Green, D.M. and Swets, J.A. 1966. Signal detection theory and psychophysics. Wiley, New York.

    Google Scholar 

  • Greiffenhagen, M., Ramesh, V., Comaniciu, D. and Niemann, H. 2000. Statistical modeling and performance characterization of a real-time dual camera surveillance system. In Proc. CVPR, 335–342.

  • Haritaoglu, I., HArwood, D. and Davis, L.S. 2000. W4: Real-time surveillance of people and their activities, IEEE PAMI, 22(8):809–830.

  • Hayman, E. and Eklundh, J.O. 2002. Probabilistic and voting approaches to cue integration for figure-ground segmentation. In European Conference on Computer Vision, of Lecture Notes in Computer Science, 2352:469–486.

  • Hess, R.F. and Dakin, S.C. 1997. Absence of contour linking in peripheral vision. Nature, 390:602–604. Letters to Nature.

    Google Scholar 

  • Ikeda, H., Blake, R. and Watanabe, K. 2005. Eccentric perception of biological motion is unscalably poor. Vision Research, 45:1935–1943.

    Article  Google Scholar 

  • Isard, M. and Blake, A. 1998. Condensation: conditional density propagation for visual tracking. International Journal of Computer Vision, 29(1):5–28.

    Article  Google Scholar 

  • Itti, L. 2005. Quantifying the contribution of low-level saliency to human eye movements in dynamic scenes, Visual Cognition, 12(6):1093–1123.

    Article  Google Scholar 

  • Izenman, A.J. 1991. Recent developments in nonparametric density estimation, Journal of the American Statistical Association, 86(413):205–224.

    Article  MATH  MathSciNet  Google Scholar 

  • Johnston, A. and Wright, M.J. 1985. Lower thresholds of motion for gratings as a function of eccentricity and contrast. Vision Research, 25(2):179–185.

    Article  Google Scholar 

  • Jones, M.J. and Rehg, J.M. 1999. Statistical color models with application to skin detection. In Proc. CVPR, 274–280.

  • Kruppa, H., Santana, M.C. and Schiele, B. 2003. Fast and robust face finding via local context. In Proc. VS-PETS, 157–164.

  • Lienhart, R. and Maydt, J. 2002. An extended set of Haar-like features for rapid object detection. In IEEE International Conference on Image Processing, 900–903.

  • Marchesotti, L., Marcenaro, L. and Regazzoni, C. 2003. Dual camera system for face detection in unconstrained environments. In Proc. ICIP, 1:681–684.

  • Miller, M.I., Grenander, U., O’Sullivan, J.A. and Synder, D.L. 1997. Automatic target recognition organized via jump-diffusion algorithms. IEEE Transactions on Image Processing, 6(1):157–174.

    Article  MATH  Google Scholar 

  • Nair, V. and Clark, J.J. 2004. An unsupervised, online learning framework for moving object detection. In Proc. CVPR, 2:317–324.

  • Parkhurst, D., Law, K. and Niebur, E. 2002. Modeling the role of salience in the allocation of overt visual attention. Vision Research, 42:107–123.

    Article  Google Scholar 

  • Rovamo, J. and Iivanainen, A. 1991. Detection of chromatic deviations from white across the human visual field. Vision Research, 31(12):2227–2234.

    Article  Google Scholar 

  • Scassellati, B. 1998. Eye finding via face detection for a foveated active vision system. In AAAI/IAAI, 969–976.

  • Schneiderman, H. 2004. Feature-centric evaluation for efficient cascaded object detection. In Proc. CVPR, 2:29–36.

  • Schneiderman, H. and Kanade, T. 2004. Object detection using the statistic of parts. International Journal of Computer Vision, 56(3):151–177.

    Article  Google Scholar 

  • Sherrah, J. and Gong, S. 2001, Continuous global evidence-based Bayesian modality fusion for simultaneous tracking of multiple objects. In Proceedings of the International Conference on Computer Vision, II:42–49.

  • Sidenbladh, H. and Black, M.J. 2003. Learning the statistics of people in images and video. International Journal of Computer Vision, 54(1/2/3):183–209.

    MATH  Google Scholar 

  • Spengler, M. and Schiele, B. 2001. Towards robust multi-cue integration for visual tracking. In International Conference on Vision Systems, Berlin, 2001, vol. 2095 of Lecture Notes in Computer Science, pp. 93–106, Springer-Verlag.

  • Sullivan, J., Blake, A., Isard, M. and MacCormick, J. 2001. Bayesian object localisation in images. International Journal of Computer Vision, 44(2):111–135.

    Article  MATH  Google Scholar 

  • Triesch, J. and von der Malsburg, C. 2001. Democratic integration: self-organized integration of adaptive cues. Neural Computation, 13:2049–2074.

    Article  MATH  Google Scholar 

  • Triesch, J. and von der Malsburg, C. 2001. A system for person-independent hand posture recognition against complex backgrounds. IEEE Transactions on Pattern Analysis and Machine Intelligence, 23(12):1449–1453.

    Article  Google Scholar 

  • Toyama, K. and Horvitz, E. 2000. Bayesian modality fusion: probabilistic integration of multiple vision algorithms for head tracking. In Fourth Asian Conference on Computer Vision.

  • Velisavljevic, L. and Elder, J.H. 2002. What do we see in a glance? [abstract]. Journal of Vision, 2(7):493.

    Google Scholar 

  • Velisavljevic, L. and Elder, J.H. 2003. Eccentricity effects in the rapid visual encoding of natural images [abstract], Journal of Vision, 3(9):647a.

    Google Scholar 

  • Viola, P. and Jones, M.J. 2001. Rapid object detection using a boosted cascade of simple features, In Proc. CVPR, 1:511–518.

  • Viola, P., Jones, M.J. and Snow, D. 2003. Detecting pedestrians using patterns of motion and appearance. In Proc. ICCV, 2:734–741.

    Google Scholar 

  • Xiong, Q. and Jaynes, C.O. 2003. Mugshot database acquisition in video surveillance networks using incremental auto-clustering quality measures. In Proc. AVSS, Los Alamos, CA, IEEE, Computer Society, 191–198.

    Google Scholar 

  • Zhao, T. and Nevatia, R. 2004. Tracking multiple humans in complex situations. IEEE PAMI, 26(9):1208–1221.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Rights and permissions

Reprints and permissions

About this article

Cite this article

Elder, J.H., Prince, S.J.D., Hou, Y. et al. Pre-Attentive and Attentive Detection of Humans in Wide-Field Scenes. Int J Comput Vision 72, 47–66 (2007). https://doi.org/10.1007/s11263-006-8892-7

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11263-006-8892-7

Keywords

Navigation