Pre-Attentive and Attentive Detection of Humans in Wide-Field Scenes

Elder, J. H.; Prince, S. J. D.; Hou, Y.; Sizintsev, M.; Olevskiy, E.

doi:10.1007/s11263-006-8892-7

Pre-Attentive and Attentive Detection of Humans in Wide-Field Scenes

Published: 17 July 2006

Volume 72, pages 47–66, (2007)
Cite this article

International Journal of Computer Vision Aims and scope Submit manuscript

J. H. Elder¹,
S. J. D. Prince¹,
Y. Hou¹,
M. Sizintsev¹ &
…
E. Olevskiy¹

250 Accesses
34 Citations
3 Altmetric
Explore all metrics

Abstract

We address the problem of localizing and obtaining high-resolution footage of the people present in a scene. We propose a biologically-inspired solution combining pre-attentive, low-resolution sensing for detection with shiftable, high-resolution, attentive sensing for confirmation and further analysis.

The detection problem is made difficult by the unconstrained nature of realistic environments and human behaviour, and the low resolution of pre-attentive sensing. Analysis of human peripheral vision suggests a solution based on integration of relatively simple but complementary cues. We develop a Bayesian approach involving layered probabilistic modeling and spatial integration using a flexible norm that maximizes the statistical power of both dense and sparse cues. We compare the statistical power of several cues and demonstrate the advantage of cue integration. We evaluate the Bayesian cue integration method for human detection on a labelled surveillance database and find that it outperforms several competing methods based on conjunctive combinations of classifiers (e.g., Adaboost). We have developed a real-time version of our pre-attentive human activity sensor that generates saccadic targets for an attentive foveated vision system. Output from high-resolution attentive detection algorithms and gaze state parameters are fed back as statistical priors and combined with pre-attentive cues to determine saccadic behaviour. The result is a closed-loop system that fixates faces over a 130 deg field of view, allowing high-resolution capture of facial video over a large dynamic scene.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

References

Abramson, Y. and Freund, Y. 2005. Semi-automatic visual learning (Seville): a tutorial on active learning for visual object recognition, http://caor.ensmp.fr/∼abramson/sevilleCVPR/.
Bose, B. and Grimson, E. 2004. Improving object classification in far-field video. In Proc. CVPR, 2:181–188.
Buxton, H. and Gong, S.G. 1995. Visual surveillance in a dynamic and uncertain world. Artificial Intelligence, 78(1–2):431–459.
Article Google Scholar
Cox, I.J. and Leonard, J.J. 1994. Modeling a dynamic environment using a bayesian multiple hypothesis approach, Artificial Intelligence, 66(2):311–344.
Google Scholar
Elder, J.H., Dornaika, F., Hou, Y. and Goldstein, R. 2005. Attentive wide-field sensing for visual telepresence and surveillance. In Neurobiology of Attention, L. Itti, G. Rees, and J. Tsotsos, (Eds.) Academic Press/Elsevier, San Diego, CA.
Google Scholar
Elder, J.H., Krupnik, A. and Johnston, L.A. 2003. Contour grouping with prior models. IEEE Transactions on Pattern Analysis and Machine Intelligence, 25(25):661–674.
Article Google Scholar
Friedman, N. and Russel, S. 1997. Image segmentation in video sequences: a probabilistic approach. In Proc. UAI, 175–181.
Green, D.M. and Swets, J.A. 1966. Signal detection theory and psychophysics. Wiley, New York.
Google Scholar
Greiffenhagen, M., Ramesh, V., Comaniciu, D. and Niemann, H. 2000. Statistical modeling and performance characterization of a real-time dual camera surveillance system. In Proc. CVPR, 335–342.
Haritaoglu, I., HArwood, D. and Davis, L.S. 2000. W⁴: Real-time surveillance of people and their activities, IEEE PAMI, 22(8):809–830.
Hayman, E. and Eklundh, J.O. 2002. Probabilistic and voting approaches to cue integration for figure-ground segmentation. In European Conference on Computer Vision, of Lecture Notes in Computer Science, 2352:469–486.
Hess, R.F. and Dakin, S.C. 1997. Absence of contour linking in peripheral vision. Nature, 390:602–604. Letters to Nature.
Google Scholar
Ikeda, H., Blake, R. and Watanabe, K. 2005. Eccentric perception of biological motion is unscalably poor. Vision Research, 45:1935–1943.
Article Google Scholar
Isard, M. and Blake, A. 1998. Condensation: conditional density propagation for visual tracking. International Journal of Computer Vision, 29(1):5–28.
Article Google Scholar
Itti, L. 2005. Quantifying the contribution of low-level saliency to human eye movements in dynamic scenes, Visual Cognition, 12(6):1093–1123.
Article Google Scholar
Izenman, A.J. 1991. Recent developments in nonparametric density estimation, Journal of the American Statistical Association, 86(413):205–224.
Article MATH MathSciNet Google Scholar
Johnston, A. and Wright, M.J. 1985. Lower thresholds of motion for gratings as a function of eccentricity and contrast. Vision Research, 25(2):179–185.
Article Google Scholar
Jones, M.J. and Rehg, J.M. 1999. Statistical color models with application to skin detection. In Proc. CVPR, 274–280.
Kruppa, H., Santana, M.C. and Schiele, B. 2003. Fast and robust face finding via local context. In Proc. VS-PETS, 157–164.
Lienhart, R. and Maydt, J. 2002. An extended set of Haar-like features for rapid object detection. In IEEE International Conference on Image Processing, 900–903.
Marchesotti, L., Marcenaro, L. and Regazzoni, C. 2003. Dual camera system for face detection in unconstrained environments. In Proc. ICIP, 1:681–684.
Miller, M.I., Grenander, U., O’Sullivan, J.A. and Synder, D.L. 1997. Automatic target recognition organized via jump-diffusion algorithms. IEEE Transactions on Image Processing, 6(1):157–174.
Article MATH Google Scholar
Nair, V. and Clark, J.J. 2004. An unsupervised, online learning framework for moving object detection. In Proc. CVPR, 2:317–324.
Parkhurst, D., Law, K. and Niebur, E. 2002. Modeling the role of salience in the allocation of overt visual attention. Vision Research, 42:107–123.
Article Google Scholar
Rovamo, J. and Iivanainen, A. 1991. Detection of chromatic deviations from white across the human visual field. Vision Research, 31(12):2227–2234.
Article Google Scholar
Scassellati, B. 1998. Eye finding via face detection for a foveated active vision system. In AAAI/IAAI, 969–976.
Schneiderman, H. 2004. Feature-centric evaluation for efficient cascaded object detection. In Proc. CVPR, 2:29–36.
Schneiderman, H. and Kanade, T. 2004. Object detection using the statistic of parts. International Journal of Computer Vision, 56(3):151–177.
Article Google Scholar
Sherrah, J. and Gong, S. 2001, Continuous global evidence-based Bayesian modality fusion for simultaneous tracking of multiple objects. In Proceedings of the International Conference on Computer Vision, II:42–49.
Sidenbladh, H. and Black, M.J. 2003. Learning the statistics of people in images and video. International Journal of Computer Vision, 54(1/2/3):183–209.
MATH Google Scholar
Spengler, M. and Schiele, B. 2001. Towards robust multi-cue integration for visual tracking. In International Conference on Vision Systems, Berlin, 2001, vol. 2095 of Lecture Notes in Computer Science, pp. 93–106, Springer-Verlag.
Sullivan, J., Blake, A., Isard, M. and MacCormick, J. 2001. Bayesian object localisation in images. International Journal of Computer Vision, 44(2):111–135.
Article MATH Google Scholar
Triesch, J. and von der Malsburg, C. 2001. Democratic integration: self-organized integration of adaptive cues. Neural Computation, 13:2049–2074.
Article MATH Google Scholar
Triesch, J. and von der Malsburg, C. 2001. A system for person-independent hand posture recognition against complex backgrounds. IEEE Transactions on Pattern Analysis and Machine Intelligence, 23(12):1449–1453.
Article Google Scholar
Toyama, K. and Horvitz, E. 2000. Bayesian modality fusion: probabilistic integration of multiple vision algorithms for head tracking. In Fourth Asian Conference on Computer Vision.
Velisavljevic, L. and Elder, J.H. 2002. What do we see in a glance? [abstract]. Journal of Vision, 2(7):493.
Google Scholar
Velisavljevic, L. and Elder, J.H. 2003. Eccentricity effects in the rapid visual encoding of natural images [abstract], Journal of Vision, 3(9):647a.
Google Scholar
Viola, P. and Jones, M.J. 2001. Rapid object detection using a boosted cascade of simple features, In Proc. CVPR, 1:511–518.
Viola, P., Jones, M.J. and Snow, D. 2003. Detecting pedestrians using patterns of motion and appearance. In Proc. ICCV, 2:734–741.
Google Scholar
Xiong, Q. and Jaynes, C.O. 2003. Mugshot database acquisition in video surveillance networks using incremental auto-clustering quality measures. In Proc. AVSS, Los Alamos, CA, IEEE, Computer Society, 191–198.
Google Scholar
Zhao, T. and Nevatia, R. 2004. Tracking multiple humans in complex situations. IEEE PAMI, 26(9):1208–1221.
Google Scholar

Download references

Author information

Authors and Affiliations

Centre for Vision Research, York University, Toronto, Ontario, M3J 1P3
J. H. Elder, S. J. D. Prince, Y. Hou, M. Sizintsev & E. Olevskiy

Authors

J. H. Elder
View author publications
You can also search for this author in PubMed Google Scholar
S. J. D. Prince
View author publications
You can also search for this author in PubMed Google Scholar
Y. Hou
View author publications
You can also search for this author in PubMed Google Scholar
M. Sizintsev
View author publications
You can also search for this author in PubMed Google Scholar
E. Olevskiy
View author publications
You can also search for this author in PubMed Google Scholar

Rights and permissions

Reprints and permissions

About this article

Cite this article

Elder, J.H., Prince, S.J.D., Hou, Y. et al. Pre-Attentive and Attentive Detection of Humans in Wide-Field Scenes. Int J Comput Vision 72, 47–66 (2007). https://doi.org/10.1007/s11263-006-8892-7

Download citation

Received: 15 December 2004
Revised: 11 November 2005
Accepted: 14 March 2006
Published: 17 July 2006
Issue Date: April 2007
DOI: https://doi.org/10.1007/s11263-006-8892-7

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Pre-Attentive and Attentive Detection of Humans in Wide-Field Scenes

Abstract

Access this article

Similar content being viewed by others

Human Activity Analysis in a 3D Bird’s-eye View

Estimating Visual Motion Using an Event-Based Artificial Retina

An Intelligent Video Surveillance System for Human Behavior

References

Author information

Authors and Affiliations

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Pre-Attentive and Attentive Detection of Humans in Wide-Field Scenes

Abstract

Access this article

Similar content being viewed by others

Human Activity Analysis in a 3D Bird’s-eye View

Estimating Visual Motion Using an Event-Based Artificial Retina

An Intelligent Video Surveillance System for Human Behavior

References

Author information

Authors and Affiliations

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation