Skip to main content
Log in

Detection of user-defined, semantically high-level, composite events, and retrieval of event queries

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

Detecting events of interest from video sequences, and searching and retrieving events from video databases are important and challenging problems. Event of interest is a very general term, since events of interest can vary significantly among different applications and users. A system that can only detect and/or retrieve a finite set of predefined events will find limited use. Thus, the event detection and retrieval problems introduce additional challenges including providing the user with flexibility to specify customized events with varying complexity, and communicating user-defined events to a system in a generic way. This paper presents a spatio-temporal event detection system that lets users specify semantically high-level and composite events, and then detects their occurrences automatically. Events can be defined on a single camera view or across multiple camera views. In addition to extracting information from videos, detecting customized events, and generating real-time alerts, the proposed system uses the extracted information in the search, retrieval, data management and investigation context. Generated event meta-data is mapped into tables in a relational database against which queries may be launched. It is therefore possible to retrieve events based on various attributes. Moreover, a variety of statistics can be computed on the event data. Thus, the presented system provides capabilities of a fully integrated smart system.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13

Similar content being viewed by others

References

  1. Adam A, Rivlin E, Shimshoni I, Reinitz D (2008) Robust real-time unusual event detection using multiple fixed-location monitors. IEEE Trans Pattern Anal Mach Intell 30(3):555–560

    Article  Google Scholar 

  2. Atsushi N, Hirokazu K, Shinsaku H, Seiji I (2002) Tracking multiple people using distributed vision systems. In: Proc. of IEEE international conf. on robotics and automation. Washington, DC, pp 2974–2981

    Google Scholar 

  3. Black J, Ellis T, Makris D (2004) A hierarchical database for visual surveillance applications. In: Proc. of IEEE conf. on multimedia and expo. Taipei, vol 3, pp 1571–1574

  4. Boiman O, Irani M (2005) Detecting irregularities in images and in video. In: Proc. of IEEE international conf. on computer vision. Beijing, China, pp 462–469

    Google Scholar 

  5. Borg M, Thirde D, Ferryman J, Fusier F, Valentin V, Brémond F, Thonnat M (2005) Video surveillance for aircraft activity monitoring. In: Proc. of IEEE conf. on advanced video and signal based surveillance. Como, Italy, pp 16–21

    Chapter  Google Scholar 

  6. Brown LM (2004) View independent vehicle/person classification. In: Proc. of ACM international workshop on video surveillance and sensor networks. New York, USA, pp 114–123

    Chapter  Google Scholar 

  7. Cai Q, Aggarwal JK (1999) Tracking human motion in structured environments using a distributed camera system. IEEE Trans Pattern Anal Mach Intell 21(11):1241–1247

    Article  Google Scholar 

  8. Chang T-H, Gong S (2001) Tracking multiple people with a multi-camera system. In: Proc. of IEEE workshop on multi-object tracking. Vancouver, BC, pp 19–26

    Chapter  Google Scholar 

  9. Comaniciu D, Ramesh V, Meer P (2000) Real-time tracking of non-rigid objects using mean shift. In: Proc. of IEEE conf. on computer vision and pattern recognition. South Carolina, USA, pp 142–149

    Google Scholar 

  10. Connell J, Senior AW, Hampapur A, Tian Y-L, Brown L, Pankanti S (2004) Detection and tracking in the IBM PeopleVision Ssystem. In: Proc. of the IEEE international conf. on multimedia and expo. Taipei, vol 2, pp 1403–1406

  11. Cupillard F, Avanzi A, Bremond F, Thonnat M (2004) Video understanding for metro surveillance. In: Proc. of IEEE conf. on networking, sensing and control. Taipei, pp 186–191

  12. Dee H, Hogg D (2004) Detecting inexplicable behaviour. In: Proc. of British machine vision conference. London, pp 477–486

  13. Dee H, Hogg D (2004) Is it interesting? Comparing human and machine judgements on the pets dataset. In: Proc. of IEEE international workshop on performance evaluation of tracking and surveillance. Prague, pp 49–55

  14. Hampapur A, Brown L, Connell J, Ekin A, Haas N, Lu M, Merkl H, Pankanti S, Senior A, Shu C-F, Tian Y-L (2005) Smart video surveillance: exploring the concept of multiscale spatiotemporal tracking. IEEE Signal Process Mag 22(2):38–51

    Article  Google Scholar 

  15. Haritaoglu I, Harwood D, Davis LS (2000) W4: real-time surveillance of people and their activities. IEEE Trans Pattern Anal Mach Intell 22(8):809–830

    Article  Google Scholar 

  16. Ivanov YA, Bobick AF (1999) Recognition of multi-agent interaction in video surveillance. In: Proc. of IEEE conf. on computer vision. Kerkyra, Greece, pp 169–176

    Chapter  Google Scholar 

  17. Kelly PH, Katkere A, Kuramura DY, Moezzi S, Chatterjee S, Jain R (1995) An architecture for multiple perspective interactive video. In: Proc. of ACM conf. on multimedia. San Francisco, California, pp 201–212

    Chapter  Google Scholar 

  18. Khan S, Shah M (2003) Consistent labeling of tracked objects in multiple cameras with overlapping fields of view. IEEE Trans Pattern Anal Mach Intell 25(10):1355–1360

    Article  Google Scholar 

  19. Lee L, Romano R, Stein G (2000) Monitoring activities from multiple video streams: establishing a common coordinate frame. IEEE Trans Pattern Anal Mach Intell 22(8):758–767

    Article  Google Scholar 

  20. Martinez-Ponte I, Desurmont X, Meessen J, Delaigle J-F (2005) Robust human face hiding ensuring privacy. In: Proc. of international workshop on image analysis for multimedia interactive services. Montreux

  21. Medioni G, Cohen I, Bremond F, Hongeng S, Nevatia R (2001) Event detection and analysis from video streams. IEEE Trans Pattern Anal Mach Intell 23(8):873–889

    Article  Google Scholar 

  22. Nevatia R, Hobbs J, Bolles B (2004) An ontology for video event representation. In: Proc. of IEEE conf. on computer vision and pattern recognition workshops. Washington, DC, pp 119

    Google Scholar 

  23. Nevatia R, Zhao T, Hongeng S (2003) Hierarchical language-based representation of events in video streams. In: Proc. of IEEE CVPR workshop on event mining. Madison, Wisconsin, pp 39

    Google Scholar 

  24. Owens J, Hunter A (2000) Application of the self-organising map to trajectory classification. In: Proc. of IEEE international workshop on visual surveillance. Dublin, Ireland, pp 77–83

    Chapter  Google Scholar 

  25. Porikli F, Haga T (2004) Event detection by eigenvector decomposition using object and frame features. In: Proc. of IEEE computer vision and pattern recognition workshops. Washington, DC, pp 462–469

    Google Scholar 

  26. Rota N, Thonnat M (2000) Video sequence interpretation for visual surveillance. In: Proc. of IEEE international workshop on visual surveillance. Dublin, Ireland pp 59–68

    Chapter  Google Scholar 

  27. Sacchi C, Regazzoni CS (2000) A distributed surveillance system for detection of abandoned objects in unmanned railway environments. In: IEEE transactions on vehicular technology, vol 49(5) pp 2013–2026

  28. Shet VD, Harwood D, Davis LS (2005) VidMAP: video monitoring of activity with prolog. In: Proc. of IEEE conf. on advanced video and signal based surveillance. Como, Italy, pp 224–229

    Chapter  Google Scholar 

  29. Stauffer C, Grimson WEL (1999) Learning patterns of activity using real time tracking. IEEE Trans Pattern Anal Mach Intell 22(8):809–830

    Google Scholar 

  30. Stringa E, Regazzoni CS (1998) Content-based retrieval and real time detection from video sequences acquired by surveillance systems. In: Proc. of IEEE international conf. on image processing. Chicago, IL, pp 138–142

    Google Scholar 

  31. Stringa E, Regazzoni CS (2000) Real-time video-shot detection for scene survaillance applications. IEEE Trans Image Process 9(1):69–79

    Article  Google Scholar 

  32. Tian Y-L, Lu M, Hampapur A (2005) Robust and efficient foreground analysis for real-time video surveillance. In: Proc. of the IEEE conf. on computer vision and pattern recognition, vol 1. San Diego, CA, pp 1182–1187

    Google Scholar 

  33. Vaswani N, Chowdhury A, Chellappa R (2003) Activity recognition using the dynamics of the configuration of interacting objects. In: Proc. of IEEE conf. on computer vision and pattern recognition. Madison, Wisconsin, pp 633–640

    Google Scholar 

  34. Velipasalar S, Brown LM, Hampapur A (2006) Specifying, interpreting and detecting high-level, spatio-temporal composite events in single and multi-camera systems. In: Proc. of international workshop on Semantic Learning Applications in Multimedia (SLAM) in conjunction with IEEE CVPR. New York, USA, pp 110–117

    Google Scholar 

  35. Vu V-T, Bremond F, Thonnat M (2003) Automatic video interpretation: a novel algorithm for temporal scenario recognition, In: Proc. of international joint conf. on artificial intelligence. Mexico

  36. Watanabe H, Tanahashi H, Satoh Y, Niwa Y, Yamamoto K (2003) Event detection for a visual surveillance system using stereo omni-directional system. In: International conf. on knowledge-based intelligent information and engineering systems. Oxford, UK, pp 890–896

    Google Scholar 

  37. Wolf W (1997) Hidden Markov model parsing of video programs. In: Proc. of IEEE international conf. on acoustics, speech, and signal processing. Munich, Germany, pp 2609–2611

    Google Scholar 

  38. Xiang T, Gong S (2005) Video behaviour profiling and abnormality detection without manual labelling. In: Proc. of IEEE international conf. computer vision. Beijing, China, pp 1238–1245

    Google Scholar 

  39. Xiang T, Gong S (2006) Beyond tracking: modelling activity and understanding behaviour. Int J Comput Vis 67(1):21–51

    Article  Google Scholar 

  40. Zhong H, Shi J, Visontai M (2004) Detecting unusual activity in video. In: Proc. of IEEE conf. on computer vision and pattern recognition. Washington, DC, pp 819–826

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Senem Velipasalar.

Additional information

This work was supported in part by the National Science Foundation under grant CNS-0834753 and the EPSCoR First Award.

Appendix

Appendix

The XML file created for the example scenario, described in Section 4.1, is shown below. The proposed format of the XML file will be explained by using this example. The outermost node of the created XML file is +<EVENT_SCENARIOS>. When this node is opened, we see the following:

$$ \begin{array}{l} \texttt{- <EVENT\_SCENARIOS>}\\ \texttt{+ <Event\_Scenario>}\\ \texttt{+ <Event\_Scenario>}\\ \texttt{</EVENT\_SCENARIOS>} \end{array} $$

There are two <Event_Scenario> nodes, one for the composite event described in Section 4.1, and one for a primitive event defined separately. When the first <Event_Scenario> node is expanded, we see:

The first entry in the <Event_Scenario> node is the OperationType, which is the SEQUENCE operator in this case. The next entry in the <Event_Scenario> node is a node called <Argument>, which represents the first argument of the SEQUENCE operation. This node is followed by another node called <Frame_Interval>. The last entry is another <Argument> node representing the second operand of the SEQUENCE operation. As seen above, when the <Argument> node is expanded, the first entry is the ArgumentType stating whether this argument is a composite event or primitive event. In this case, the argument is a PRIMITIVE event. If the argument was a COMPOSITE event instead, then the next entry would be the OperationType again, stating the type of the operation relating the arguments of that composite argument. If the ArgumentType is PRIMITIVE, then the next node is the <Individual_Alarm_Definition>, which includes all the information about the primitive event definition. The <Frame_Interval> node of the <Event_Scenario> has the minimum and maximum number of frames desired between the arguments of the SEQUENCE operation.

When the <Individual_Alarm_Definition> node is expanded, ViewID is the ID number of the camera view. It tells which camera view the event was defined on. ClassID is the name of the primitive event, which is one of the six primitives we have. Identity is the description given to the event. <ViewDim> node has the information about the image of the camera view, specifically its width, height, and the origin location. If the definition of a primitive event includes defining a ROI, this information is saved under the <ROIs> node. This node has a <Polygon> node for each ROI. Each <Polygon> node is composed of <Vertex> nodes which have the <X> and <Y> coordinates of the vertices. If a schedule is defined for a primitive event, this information is saved under the <Active> node. It has the month, day, date, hour and minute information for the start and end times of the scheduling. The other parameters of the primitive event are saved under the <Parameters>. For instance, it can have the vertices of the trip wires or the minimum and maximum detected object sizes. Finally, the <Response> node has the <Email> and <Command> entries. <Email> holds the information about where to send an e-mail, and <Command> is the command to execute when this event happens.

The second <Event_ Scenario> node in our example XML file has the information about an “Abandoned Object” primitive event defined separately. When an event scenario is only a primitive event, the <OperationType> is NONE, and there is only one <Argument>.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Velipasalar, S., Brown, L.M. & Hampapur, A. Detection of user-defined, semantically high-level, composite events, and retrieval of event queries. Multimed Tools Appl 50, 249–278 (2010). https://doi.org/10.1007/s11042-010-0489-z

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11042-010-0489-z

Keywords

Navigation