How close are we to solving the problem of automated visual surveillance?

Dee, Hannah M.; Velastin, Sergio A.

doi:10.1007/s00138-007-0077-z

How close are we to solving the problem of automated visual surveillance?

A review of real-world surveillance, scientific progress and evaluative mechanisms

Special Issue Paper
Published: 05 May 2007

Volume 19, pages 329–343, (2008)
Cite this article

Machine Vision and Applications Aims and scope Submit manuscript

Hannah M. Dee¹ &
Sergio A. Velastin²

869 Accesses
106 Citations
6 Altmetric
Explore all metrics

Abstract

The problem of automated visual surveillance has spawned a lively research area, with 2005 seeing three conferences or workshops and special issues of two major journals devoted to the topic. These alone are responsible for somewhere in the region of 240 papers and posters on automated visual surveillance before we begin to count those presented in more general fora. Many of these systems and algorithms perform one small sub-part of the surveillance task, such as motion detection. But even with low level image processing tasks it is often difficult to compare systems on the basis of published results alone. This review paper aims to answer the difficult question “How close are we to developing surveillance related systems which are really useful?” The first section of this paper considers the question of surveillance in the real world: installations, systems and practises. The main body of the paper then considers existing computer vision techniques with an emphasis on higher level processes such as behaviour modelling and event detection. We conclude with a review of the evaluative mechanisms that have grown from within the computer vision community in an attempt to provide some form of robust evaluation and cross-system comparability.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

References

Aguilera, J., Wildenauer, H., Kampel, M., Borg, M., Thirde, D., Ferryman, J.: Evaluation of motion segmentation quality for aircraft activity surveillance. In: Joint IEEE International Workshop on Visual Surveillance and Performance Evaluation of Tracking and Surveillance (VS-PETS 2005), pp. 293–300. Beijing, China (2005)
Aoki, M.: Imaging and analysis of traffic scene. In: IEEE International Conference on Image Processing, vol.4, pp. 1–5. Kobe, Japan (1999)
Armitage R. (2002). To CCTV or not to CCTV? A review of current research in the effectiveness of CCTV systems in reducing crime. NACRO, London
Google Scholar
Baumberg A. and Hogg D.C. (1996). Learning spatiotemporal models from examples. Image Vis. Comput. 14(8): 525–532
Article Google Scholar
BBC news online. CCTV voyeurism story. 2005. http://www.news. bbc.co.uk/1/hi/england/merseyside/4521342.stm
Black, J., Velastin, S., Boghossian, B.: A real-time surveillance system for metropolitan railways. In: Proceedings of. International Conference on Advanced Video and Signal Based Surveillance (AVSS), pp. 189–194. Como, Italy (2005)
Boiman, O., Irani, M.: Detecting irregularities in images and in video. In: Proceedings of International Conference on Computer Vision (ICCV). Beijing, China (2005)
Brand M. and Kettnaker V. (2000). Discovery and segmentation of activities in video. IEEE Trans. Pattern Anal. Mach. Intell. (PAMI) 22(8): 747–757
Article Google Scholar
Brand, M., Oliver, N., Pentland, A.: Coupled hidden markov models for complex action recognition. In: Proceedings of Computer Vision and Pattern Recognition (CVPR), pp. 994–999 (1997)
Brémond, F., Thonnat, M., Zuniga, M.: Video understanding framework for automatic behavior recognition. Behav. Res. Meth. (in print) (2006)
Buxton H. (2003). Learning and understanding dynamic scene activity: a review. Image Vis. Comput. 21(1): 125–136
Article Google Scholar
Buxton H. and Gong S. (1995). Visual surveillance in a dynamic and uncertain world. Artif. Intell. 78(1–2): 431–459
Article Google Scholar
Dee, H.M., Hogg, D.C.: Detecting inexplicable behaviour. In: of British Machine Vision Conference (BMVC). Kingston-on-Thames, UK (2004)
Dee, H.M., Hogg, D.C.: Is it interesting? comparing human and machine judgements on the PETS dataset. In: ECCV-PETS: the Performance Evaluation of Tracking and Surveillance workshop at the European Conference on Computer Vision. Prague, Czech Republic (2004)
Ditton J., Short E.: Evaluating Scotland’s first town centre CCTV scheme. In: Norris, C., Moran, J., Armstrong, G. (eds.) Surveillance, closed circuit television and social control, pp. 155–173. Ashgate, Aldershot (1998)
Google Scholar
François A.R.J., Nevatia R., Hobbs J. and Bolles R.C. (2005). VERL: an ontology for representing and annotating video events. IEEE Multimed. Mag. 12(4): 76–86
Article Google Scholar
Galata, A., Cohn, A.G., Magee, D.R., Hogg, D.C.: Modeling interaction using learnt qualitative spatio-temporal relations and length Markov models. In: Proceedings of European Conference on Artificial Intelligence (ECAI), pp. 741–745. Lyon, France (2002)
Gong, S., Xiang, T.: Recognition of group activities using dynamic probablistic networks. In: Proceedings of International Conference on Computer Vision (ICCV), pp. 742–749. Nice, France (2003)
Graves, A., Gong, S.: Wavelet based holistic sequence descriptor for generating video summaries. In: Proceeedings of British Machine Vision Conference (BMVC), pp. 167–176. Kingston, UK (2004)
Greenhill, D., Renno, J., Orwell, J., Jones, G.A.: Occlusion analysis: learning and utilising depth maps in object tracking. In: of British Machine Vision Conference (BMVC), pp. 467–476. Kingston, UK (2004)
Grimson, W.E.L., Stauffer, C., Romano, R., Lee, L.: Using adaptive tracking to classify and monitor activities in a site. In: Proceedings of Computer Vision and Pattern Recognition (CVPR), pp. 246–252. Santa Barbara, CA (1998)
Hampel, F.: Robust statistics: a brief introduction and overview. In: Seminar für Statistik, Eidgenössische Technische Hochschule. Zürich, Switzerland (2001)
Hockaday, S.: Evaluation of image processing technology for applications in highway operations. Technical Report Final Report TR91-2, Transportation Research Group, California Polytechnic State University, San Luis Obispo, California (1991)
Home Office Scientific Development Branch. Evaluating ‘intelligent’ CCTV—i-LIDS: imagery library for intelligent detection systems 2005.http://www.scienceandresearch.homeoffice.gov.uk/hosdb/news-events/270405
Hongeng, S., Nevatia, R.: Multi-agent event recognition. In: of International Conference on Computer Vision (ICCV), pp. 84–91. Vancouver, Canada (2001)
Howarth, R.J., Buxton, H.: Conceptual descriptions from monitoring and watching image sequences. Image Vis. Comput. 18, 105–135 (2000)
Article Google Scholar
Hu W., Tan T., Wang L. and Maybank S. (2004). A survey on visual surveillance of object motion and behaviours. IEEE Tran. Syst. Man and Cybern. 34(3): 334–352
Article Google Scholar
Huang, T., Russell, S.: Object identification in a Bayesian context. In: Proceedings of International Joint Conference on Artificial Intelligence(IJCAI), pp. 1276–1283. Nagoya, Japan (1997)
Hung, H., Gong, S.: Detecting and quantifying unusual interactions by correlating salient action. In: Proceedings of International Conference on Advanced Video and Signal Based Surveillance (AVSS), pp. 46–51. Como, Italy (2005)
Institute of Electrical and Electronics Engineers: IEEE standard computer dictionary: a compilation of IEEE standard computer glossaries. IEEE, New York (1990)
Intille S.S. and Bobick A.F. (2001). Recognising planned, multiperson action. Comput. Vis. Image Underst. (CVIU) 81: 414–445
Article MATH Google Scholar
Isard, M., Blake, A.: A mixed-state CONDENSATION tracker with automatic model-switching. In: Proceedings of International Conference on Computer Vision (ICCV), pp. 107–112. Bombay, India (1998)
Isard, M., MacCormick, J.: BraMBLe: a Bayesian multiple-blob tracker. In: Proceedings of International Conference on Computer Vision (ICCV), pp. 34–41. Vancouver, Canada (2001)
Ivanov Y.A. and Bobick A.F. (2000). Recognition of visual activities and interactions by stochastic parsing. IEEE Trans. Pattern Anal. Mach. Intell. (PAMI) 22(8): 852–872
Article Google Scholar
Jan, T., Piccardi, M., Hintz, T.: Detection of suspicious pedestrian behavior using modified probabilistic neural network. In: Proceedings of Image and Vision Computing, pp. 237–241. Auckland, New Zealand, 2002
Johnson, N., Galata, A., Hogg, D.C.: The acquisition and use of interaction behaviour models. In: Proceedings of Computer Vision and Pattern Recognition (CVPR), pp. 866–871. Santa Barbara, CA (1998)
Johnson N. and Hogg D.C. (1996). Learning the distribution of object tractories for event recognition. Image Vis. Comput. 14(8): 609–615
Article Google Scholar
Kalman R. (1960). A new approach to linear filtering and prediction problems. Trans. ASME J. Basic Eng. 82: 35–45
Google Scholar
Kingston University, Mott MacDonald and Ipsotek Limited: Maximising benefits from CCTV on the railway—existing systems. Technical report, Rail Safety and Standards Board (2003)
Liberty CCTV, 2005. http://www.liberty-human-rights.org.uk/ privacy/cctv.shtml
List, T., Bins, J., Vazquez, J., Fisher, R.B.: Performance evaluating the evaluator. In: Joint IEEE International Workshop on Visual Surveillance and Performance Evaluation of Tracking and Surveillance (VS-PETS 2005). Beijing, China (2005)
Magee D.R. and Boyle R.D. (2002). Detecting lameness using ‘ condensation’ and ‘multi-stream cyclic Hidden Markov models’. Image Vis. Comput. 20(8): 581–594
Article Google Scholar
Makris D. and Ellis T. (2005). Learning semantic scene models from observing activity in visual surveillance. IEEE Trans. Syst. Man Cybern. 35(3): 397–408
Article Google Scholar
Makris D. and Ellis T.J. (2002). Path detection in video surveillance. Image Vis Comput 20(12): 895–903
Article Google Scholar
McCahill, M., Norris, C.: CCTV in Britain. In: On the threshold to Urban Panopticon?: Analysing the Employment of CCTV in European Cities and Assessing its Social and Political Impacts. Technical University Berlin (2003)
McCahill, M., Norris, C.: CCTV systems in London: their structures and practices. In: On the threshold to Urban Panopticon?: Analysing the Employment of CCTV in European Cities and Assessing its Social and Political Impacts. Technical University Berlin (2003)
McKenna S.J. and Nait Charif H. (2004). Summarising contextual activity and detecting unusual inactivity in a supportive home environment. Pattern Anal. Appl. 7(4): 386–401
Article Google Scholar
Medioni G., Cohen I., Brémond F., Hongeng S. and Nevatia R. (2001). Event detection and analysis from video streams. IEEE Trans. Pattern Anal. Mach. Intell. (PAMI) 23(8): 873–889
Article Google Scholar
Meer P.: Robust techniques for computer vision. In: Medioni, G., Kang, S.B. (ed.) Emerging topics in computer vision pp. 107–190. Prentice Hall, Englewood cliffs (2004)
Google Scholar
Morris R.J. and Hogg D.C. (2000). Statistical models of object interaction. Int. J. Comput. Vis. 37(2): 209–215
Article MATH Google Scholar
Needham, C.J., Boyle, R.D.: Performance evaluation metrics and statistics for postitional tracker evaluation. In: Proceedings of International Conference on Computer Vision Systems, pp. 278–289. Austria (2003)
Norris C. and Armstrong C. (1999). The Maximum Surveillance Society. Berg, Oxford
Google Scholar
Norris C., McCahill M. and Wood D. (2004). Editorial: the growth of CCTV: a global perspective on the international diffusion of video surveillance in publicly accessible space. Surveill. Soc. 2(2/3): 110–135
Google Scholar
Oliver, N., Rosario, B., Pentland, A.: Statistical modeling of human interactions. In: Proceedings of IEEE CVPR Workshop on the Interpretation of Visual Motion, pp. 39–46. Santa Barbara, CA (1998)
Oliver N.M., Rosario B. and Pentland A.P. (2000). A Bayesian computer system for modeling human interactions. IEEE Trans. Pattern Anal. Mach. Intell. (PAMI) 22(8): 831–843
Article Google Scholar
Pasula, H., Russell, S., Ostland, M., Ritov, Y.: Tracking many objects with many sensors. In: Proceedings of International Joint Conference on Artificial Intelligence(IJCAI), pp. 1160–1171. Stockholm, Sweden (1999)
Remagnino, P., Baumberg, A., Grove, T., Hogg, D.C., Tan, T., Worrall, A., Baker, K.: An integrated traffic and pedestrian model-based vision system. In: Proceedings of British Machine Vision Conference (BMVC), pp. 380–389. Essex, UK (1997)
Remagnino, P., Tan, T., Baker, K.: Agent orientated annotation in model based visual surveillance. In: Proceedings of International Conference on Computer Vision (ICCV), pp. 857–862. Bombay, India (1998)
Remagnino P., Tan T. and Baker K. (1998). Multi-agent visual surveillance of dynamic scenes. Image Vis. Comput. 16: 529–532
Article Google Scholar
Robertson, N., Reid, I.: Behaviour understanding in video: a combined method. In: Proceedings of International Conference on Computer Vision (ICCV). Beijing, China (2005)
Rowe, N.C.: Detecting suspicious behaviour from positional information. In: Modelling Others from Observations Workshop at IJCAI. Edinburgh, Scotland (2005)
Sacks H. (1972). Notes on police assessment of moral character. In: Sudnow, D. (eds) Studies in social interaction., pp 280–293. Free Press, New York
Google Scholar
Sage, K.H., Buxton, H.: Joint spatial and temporal structure learning for task based control. In: Proceedings of International Conference on Pattern Recognition (ICPR), pp. 48–51. Cambridge, UK (2004)
Schwerdt, K., Maman, D., Bernas, P., Paul, E.: Target segmentation and event detection at video-rate: the eagle project. In: Proceedings of International Conference on Advanced Video and Signal Based Surveillance (AVSS), pp. 183–188. Como, Italy (2005)
Scödl, A., Essa, I.: Depth layers from occlusions. In: Proceedings of Computer Vision and Pattern Recognition (CVPR), pp. 339–644. Kawai, Hawaii (2001)
Senior, A.: Tracking people with probabilistic appearance models. In: IEEE workshop on Performance Evaluation of Tracking and Surveillance, pp. 48–55. Copenhagen, Denmark (2002)
Seyve, C.: Metro railway security algorithms with real world experience adapted to the RATP dataset. In: Proceedings of International Conference on Advanced Video and Signal Based Surveillance (AVSS), pp. 177–182. Como, Italy (2005)
Sherrah, J., Gong, S.: Automated detection of localised visual events over varying temporal scales. In: Proceedings of European Workshop on Advanced Video-based Surveillance Systems, pp. 215–227. Kingston, UK (2001)
Sherrah, J., Gong, S.: Continuous global evidence-based modality fusion for simultaneous tracking of multiple objects. In: Proceedings of International Conference on Computer Vision (ICCV), pp. 42–49. Vancouver, Canada (2001)
Siebel, N.T., Maybank, S.: The advisor visual surveillance system. In: Proceedings of the ECCV 2004 workshop Applications of Computer Vision (ACV’04), pp. 103–111. Prague, Czech Republic (2004)
Siegal S. and Castellan N.J. (1988). Nonparametric statistics for the behavioral sciences, 2nd edn. McGraw Hill, Singapore
Google Scholar
Silogic: Evaluation du traitement et de l’interpretation de séquences video . Introduction to evaluation and metrics, 2005. Available from http://www.silogic.fr/etiseo/bibliothequeDocuments00010058. html
Skinns, D.: Crime reduction, diffusion and displacement: the effectiveness of CCTV. In: Norris, C., Moran, J., Armstrong, G. (eds.) Surveillance, closed circuit television and social control, pp. 175–188. Ashgate, Aldershot (1988)
Google Scholar
Smith G.J.D. (2004). Behind the screens: examining constructions of deviance and informal practices among CCTV control room operators in the UK. Surveil Soc. 2(2/3): 376–395
Google Scholar
Spirito, M., Regazzoni, C.S., Marcenaro, L.: Automatic detection of dangerous events for underground surveillance. In: Proceedings of International Conference on Advanced Video and Signal Based Surveillance (AVSS), pp. 195–200. Como, Italy (2005)
Stauffer, C.: Automatic hierarchical classification using time-based co-occurrences. In: Proceedings of. Computer Vision and Pattern Recognition (CVPR), pp. 333–339. Ft. Collins, CO (1999)
Stauffer, C.: Estimating tracking sources and sinks. In: Proceedings of 2nd IEEE workshop on event mining, pp. 259–266. Madison, WI (2003)
Stauffer C. and Grimson E. (2000). Learning patterns of activity using real-time tracking. IEEE Trans. Pattern Anal. Mach. Intell. (PAMI) 22(8): 747–757
Article Google Scholar
Stauffer, C., Grimson, W.: Adaptive background mixture models for real-time tracking. In: Proceedings of Computer Vision and Pattern Recognition (CVPR), pp. 246–252. Fort Collins, CO (1999)
Sumpter N. and Bulpitt A. (1999). Learning spatio-temporal patterns for predicting object behaviour. Image Vis. Comput. 18(9): 697–704
Article Google Scholar
Svensson, M.S., Heath, C., Luff, P.: Monitoring practice: event detection and system design. In: Velastin, S.A., Remagnino, P. (eds.) Intelligent Distributed Surveillance Systems. The Institution of Electrical Engineers (IEE) (2005)
Tilley, N.: Evaluating the effectiveness of CCTV schemes. In: Norris, C., Moran, J., Armstrong, G. (eds.), Surveillance, closed circuit television and social control, pp. 139–153. Ashgate, Aldershot (1998)
Google Scholar
Troscianko T., Holmes A., Stillman J., Mirmehdi M., Wright D. and Wilson A. (2004). What happens next? the predictability of natural behaviour viewed through CCTV cameras. Perception 33(1): 87–101
Article Google Scholar
Velastin S.A., Boghossian B.A., Lo B.P.L., Sun J. and Vicencio-Silva M.A. (2005). PRISMATICA: toward ambient intelligence in public transport environments. IEEE Trans. Syst. Man Cybern. Part A 35(1): 164–182
Article Google Scholar
Viola, P., Jones, M.J., Snow, D.: Detecting pedestrians using patterns of motion and appearance. In: Proceedings of International Conference on Computer Vision (ICCV), pp. 734–741. Nice, France (2003)
Vogler C. and Metaxas D. (2001). A framework for recognising the simultaneous aspects of american sign language. Comput. Vis. Image Underst. (CVIU) 81: 358–384
Article MATH Google Scholar
Wallace, E., Diffley, C.: CCTV control room ergonomics. Technical Report 14/98, Police Scientific Development Branch (PSDB), UK Home Office (1988)
Wallace, R.: Finding natural clusters through entropy minimization. Ph.D. Thesis, CMU (1989)
Wu, G., Wu, Y., Jiao, L., Wang, Y., Chang, E.: Multicamera -temporal fusion and biased sequence-data learning for security surveillance. In: Proceedings. of ACM International Conference on Multimedia, November 2003., pp. 528–538. Berkeley, CA (2003)
Xu, M., Ellis, T.: Partial observation vs. blind tracking through occlusion. In: Proceedings of British Machine Vision Conference (BMVC), pp. 777–786. Cardiff, UK (2002)
Young, D.P., Ferryman, J.M.: PETS metrics on-line performance evaluation service. In: Joint IEEE International Workshop on Visual Surveillance and Performance Evaluation of Tracking and Surveillance (VS-PETS 2005). Beijing, China (2005)
Zhong, H., Shi, J., Visontai, M.: Detecting unusual activity in video. In: Proceedings of Computer Vision and Pattern Recognition (CVPR), p. 819826. Washington, DC (2004)
Zilani, F., Velastin, S., Porikli, F., Marcenaro, L., Kelliher, T., Cavallaro, A., Bruneaut, P.: Performance evaluation of event detection solutions: the CREDS experience. In: Proceedings of International Conference on Advanced Video and Signal Based Surveillance (AVSS), pp. 201–206. Como, Italy (2005)

Download references

Author information

Authors and Affiliations

School of Computing, University of Leeds, Leeds, LS2 9JT, UK
Hannah M. Dee
Digital Imaging Research Centre, Kingston University, Kingston-upon-Thames, KT1 2EE, UK
Sergio A. Velastin

Authors

Hannah M. Dee
View author publications
You can also search for this author in PubMed Google Scholar
Sergio A. Velastin
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Hannah M. Dee.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Dee, H.M., Velastin, S.A. How close are we to solving the problem of automated visual surveillance?. Machine Vision and Applications 19, 329–343 (2008). https://doi.org/10.1007/s00138-007-0077-z

Download citation

Received: 15 April 2006
Accepted: 18 March 2007
Published: 05 May 2007
Issue Date: October 2008
DOI: https://doi.org/10.1007/s00138-007-0077-z

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

How close are we to solving the problem of automated visual surveillance?

Abstract

Access this article

Similar content being viewed by others

Microsoft COCO: Common Objects in Context

Attention mechanisms in computer vision: A survey

A survey of methods for time series change point detection

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

How close are we to solving the problem of automated visual surveillance?

Abstract

Access this article

Similar content being viewed by others

Microsoft COCO: Common Objects in Context

Attention mechanisms in computer vision: A survey

A survey of methods for time series change point detection

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation