Abstract
Detecting events of interest from video sequences, and searching and retrieving events from video databases are important and challenging problems. Event of interest is a very general term, since events of interest can vary significantly among different applications and users. A system that can only detect and/or retrieve a finite set of predefined events will find limited use. Thus, the event detection and retrieval problems introduce additional challenges including providing the user with flexibility to specify customized events with varying complexity, and communicating user-defined events to a system in a generic way. This paper presents a spatio-temporal event detection system that lets users specify semantically high-level and composite events, and then detects their occurrences automatically. Events can be defined on a single camera view or across multiple camera views. In addition to extracting information from videos, detecting customized events, and generating real-time alerts, the proposed system uses the extracted information in the search, retrieval, data management and investigation context. Generated event meta-data is mapped into tables in a relational database against which queries may be launched. It is therefore possible to retrieve events based on various attributes. Moreover, a variety of statistics can be computed on the event data. Thus, the presented system provides capabilities of a fully integrated smart system.













Similar content being viewed by others
References
Adam A, Rivlin E, Shimshoni I, Reinitz D (2008) Robust real-time unusual event detection using multiple fixed-location monitors. IEEE Trans Pattern Anal Mach Intell 30(3):555–560
Atsushi N, Hirokazu K, Shinsaku H, Seiji I (2002) Tracking multiple people using distributed vision systems. In: Proc. of IEEE international conf. on robotics and automation. Washington, DC, pp 2974–2981
Black J, Ellis T, Makris D (2004) A hierarchical database for visual surveillance applications. In: Proc. of IEEE conf. on multimedia and expo. Taipei, vol 3, pp 1571–1574
Boiman O, Irani M (2005) Detecting irregularities in images and in video. In: Proc. of IEEE international conf. on computer vision. Beijing, China, pp 462–469
Borg M, Thirde D, Ferryman J, Fusier F, Valentin V, Brémond F, Thonnat M (2005) Video surveillance for aircraft activity monitoring. In: Proc. of IEEE conf. on advanced video and signal based surveillance. Como, Italy, pp 16–21
Brown LM (2004) View independent vehicle/person classification. In: Proc. of ACM international workshop on video surveillance and sensor networks. New York, USA, pp 114–123
Cai Q, Aggarwal JK (1999) Tracking human motion in structured environments using a distributed camera system. IEEE Trans Pattern Anal Mach Intell 21(11):1241–1247
Chang T-H, Gong S (2001) Tracking multiple people with a multi-camera system. In: Proc. of IEEE workshop on multi-object tracking. Vancouver, BC, pp 19–26
Comaniciu D, Ramesh V, Meer P (2000) Real-time tracking of non-rigid objects using mean shift. In: Proc. of IEEE conf. on computer vision and pattern recognition. South Carolina, USA, pp 142–149
Connell J, Senior AW, Hampapur A, Tian Y-L, Brown L, Pankanti S (2004) Detection and tracking in the IBM PeopleVision Ssystem. In: Proc. of the IEEE international conf. on multimedia and expo. Taipei, vol 2, pp 1403–1406
Cupillard F, Avanzi A, Bremond F, Thonnat M (2004) Video understanding for metro surveillance. In: Proc. of IEEE conf. on networking, sensing and control. Taipei, pp 186–191
Dee H, Hogg D (2004) Detecting inexplicable behaviour. In: Proc. of British machine vision conference. London, pp 477–486
Dee H, Hogg D (2004) Is it interesting? Comparing human and machine judgements on the pets dataset. In: Proc. of IEEE international workshop on performance evaluation of tracking and surveillance. Prague, pp 49–55
Hampapur A, Brown L, Connell J, Ekin A, Haas N, Lu M, Merkl H, Pankanti S, Senior A, Shu C-F, Tian Y-L (2005) Smart video surveillance: exploring the concept of multiscale spatiotemporal tracking. IEEE Signal Process Mag 22(2):38–51
Haritaoglu I, Harwood D, Davis LS (2000) W4: real-time surveillance of people and their activities. IEEE Trans Pattern Anal Mach Intell 22(8):809–830
Ivanov YA, Bobick AF (1999) Recognition of multi-agent interaction in video surveillance. In: Proc. of IEEE conf. on computer vision. Kerkyra, Greece, pp 169–176
Kelly PH, Katkere A, Kuramura DY, Moezzi S, Chatterjee S, Jain R (1995) An architecture for multiple perspective interactive video. In: Proc. of ACM conf. on multimedia. San Francisco, California, pp 201–212
Khan S, Shah M (2003) Consistent labeling of tracked objects in multiple cameras with overlapping fields of view. IEEE Trans Pattern Anal Mach Intell 25(10):1355–1360
Lee L, Romano R, Stein G (2000) Monitoring activities from multiple video streams: establishing a common coordinate frame. IEEE Trans Pattern Anal Mach Intell 22(8):758–767
Martinez-Ponte I, Desurmont X, Meessen J, Delaigle J-F (2005) Robust human face hiding ensuring privacy. In: Proc. of international workshop on image analysis for multimedia interactive services. Montreux
Medioni G, Cohen I, Bremond F, Hongeng S, Nevatia R (2001) Event detection and analysis from video streams. IEEE Trans Pattern Anal Mach Intell 23(8):873–889
Nevatia R, Hobbs J, Bolles B (2004) An ontology for video event representation. In: Proc. of IEEE conf. on computer vision and pattern recognition workshops. Washington, DC, pp 119
Nevatia R, Zhao T, Hongeng S (2003) Hierarchical language-based representation of events in video streams. In: Proc. of IEEE CVPR workshop on event mining. Madison, Wisconsin, pp 39
Owens J, Hunter A (2000) Application of the self-organising map to trajectory classification. In: Proc. of IEEE international workshop on visual surveillance. Dublin, Ireland, pp 77–83
Porikli F, Haga T (2004) Event detection by eigenvector decomposition using object and frame features. In: Proc. of IEEE computer vision and pattern recognition workshops. Washington, DC, pp 462–469
Rota N, Thonnat M (2000) Video sequence interpretation for visual surveillance. In: Proc. of IEEE international workshop on visual surveillance. Dublin, Ireland pp 59–68
Sacchi C, Regazzoni CS (2000) A distributed surveillance system for detection of abandoned objects in unmanned railway environments. In: IEEE transactions on vehicular technology, vol 49(5) pp 2013–2026
Shet VD, Harwood D, Davis LS (2005) VidMAP: video monitoring of activity with prolog. In: Proc. of IEEE conf. on advanced video and signal based surveillance. Como, Italy, pp 224–229
Stauffer C, Grimson WEL (1999) Learning patterns of activity using real time tracking. IEEE Trans Pattern Anal Mach Intell 22(8):809–830
Stringa E, Regazzoni CS (1998) Content-based retrieval and real time detection from video sequences acquired by surveillance systems. In: Proc. of IEEE international conf. on image processing. Chicago, IL, pp 138–142
Stringa E, Regazzoni CS (2000) Real-time video-shot detection for scene survaillance applications. IEEE Trans Image Process 9(1):69–79
Tian Y-L, Lu M, Hampapur A (2005) Robust and efficient foreground analysis for real-time video surveillance. In: Proc. of the IEEE conf. on computer vision and pattern recognition, vol 1. San Diego, CA, pp 1182–1187
Vaswani N, Chowdhury A, Chellappa R (2003) Activity recognition using the dynamics of the configuration of interacting objects. In: Proc. of IEEE conf. on computer vision and pattern recognition. Madison, Wisconsin, pp 633–640
Velipasalar S, Brown LM, Hampapur A (2006) Specifying, interpreting and detecting high-level, spatio-temporal composite events in single and multi-camera systems. In: Proc. of international workshop on Semantic Learning Applications in Multimedia (SLAM) in conjunction with IEEE CVPR. New York, USA, pp 110–117
Vu V-T, Bremond F, Thonnat M (2003) Automatic video interpretation: a novel algorithm for temporal scenario recognition, In: Proc. of international joint conf. on artificial intelligence. Mexico
Watanabe H, Tanahashi H, Satoh Y, Niwa Y, Yamamoto K (2003) Event detection for a visual surveillance system using stereo omni-directional system. In: International conf. on knowledge-based intelligent information and engineering systems. Oxford, UK, pp 890–896
Wolf W (1997) Hidden Markov model parsing of video programs. In: Proc. of IEEE international conf. on acoustics, speech, and signal processing. Munich, Germany, pp 2609–2611
Xiang T, Gong S (2005) Video behaviour profiling and abnormality detection without manual labelling. In: Proc. of IEEE international conf. computer vision. Beijing, China, pp 1238–1245
Xiang T, Gong S (2006) Beyond tracking: modelling activity and understanding behaviour. Int J Comput Vis 67(1):21–51
Zhong H, Shi J, Visontai M (2004) Detecting unusual activity in video. In: Proc. of IEEE conf. on computer vision and pattern recognition. Washington, DC, pp 819–826
Author information
Authors and Affiliations
Corresponding author
Additional information
This work was supported in part by the National Science Foundation under grant CNS-0834753 and the EPSCoR First Award.
Appendix
Appendix
The XML file created for the example scenario, described in Section 4.1, is shown below. The proposed format of the XML file will be explained by using this example. The outermost node of the created XML file is +<EVENT_SCENARIOS>. When this node is opened, we see the following:
There are two <Event_Scenario> nodes, one for the composite event described in Section 4.1, and one for a primitive event defined separately. When the first <Event_Scenario> node is expanded, we see:


The first entry in the <Event_Scenario> node is the OperationType, which is the SEQUENCE operator in this case. The next entry in the <Event_Scenario> node is a node called <Argument>, which represents the first argument of the SEQUENCE operation. This node is followed by another node called <Frame_Interval>. The last entry is another <Argument> node representing the second operand of the SEQUENCE operation. As seen above, when the <Argument> node is expanded, the first entry is the ArgumentType stating whether this argument is a composite event or primitive event. In this case, the argument is a PRIMITIVE event. If the argument was a COMPOSITE event instead, then the next entry would be the OperationType again, stating the type of the operation relating the arguments of that composite argument. If the ArgumentType is PRIMITIVE, then the next node is the <Individual_Alarm_Definition>, which includes all the information about the primitive event definition. The <Frame_Interval> node of the <Event_Scenario> has the minimum and maximum number of frames desired between the arguments of the SEQUENCE operation.
When the <Individual_Alarm_Definition> node is expanded, ViewID is the ID number of the camera view. It tells which camera view the event was defined on. ClassID is the name of the primitive event, which is one of the six primitives we have. Identity is the description given to the event. <ViewDim> node has the information about the image of the camera view, specifically its width, height, and the origin location. If the definition of a primitive event includes defining a ROI, this information is saved under the <ROIs> node. This node has a <Polygon> node for each ROI. Each <Polygon> node is composed of <Vertex> nodes which have the <X> and <Y> coordinates of the vertices. If a schedule is defined for a primitive event, this information is saved under the <Active> node. It has the month, day, date, hour and minute information for the start and end times of the scheduling. The other parameters of the primitive event are saved under the <Parameters>. For instance, it can have the vertices of the trip wires or the minimum and maximum detected object sizes. Finally, the <Response> node has the <Email> and <Command> entries. <Email> holds the information about where to send an e-mail, and <Command> is the command to execute when this event happens.
The second <Event_ Scenario> node in our example XML file has the information about an “Abandoned Object” primitive event defined separately. When an event scenario is only a primitive event, the <OperationType> is NONE, and there is only one <Argument>.
Rights and permissions
About this article
Cite this article
Velipasalar, S., Brown, L.M. & Hampapur, A. Detection of user-defined, semantically high-level, composite events, and retrieval of event queries. Multimed Tools Appl 50, 249–278 (2010). https://doi.org/10.1007/s11042-010-0489-z
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-010-0489-z