Detection of user-defined, semantically high-level, composite events, and retrieval of event queries

Velipasalar, Senem; Brown, Lisa M.; Hampapur, Arun

doi:10.1007/s11042-010-0489-z

Detection of user-defined, semantically high-level, composite events, and retrieval of event queries

Published: 06 March 2010

Volume 50, pages 249–278, (2010)
Cite this article

Multimedia Tools and Applications Aims and scope Submit manuscript

Senem Velipasalar¹,
Lisa M. Brown² &
Arun Hampapur²

191 Accesses
3 Citations
3 Altmetric
Explore all metrics

Abstract

Detecting events of interest from video sequences, and searching and retrieving events from video databases are important and challenging problems. Event of interest is a very general term, since events of interest can vary significantly among different applications and users. A system that can only detect and/or retrieve a finite set of predefined events will find limited use. Thus, the event detection and retrieval problems introduce additional challenges including providing the user with flexibility to specify customized events with varying complexity, and communicating user-defined events to a system in a generic way. This paper presents a spatio-temporal event detection system that lets users specify semantically high-level and composite events, and then detects their occurrences automatically. Events can be defined on a single camera view or across multiple camera views. In addition to extracting information from videos, detecting customized events, and generating real-time alerts, the proposed system uses the extracted information in the search, retrieval, data management and investigation context. Generated event meta-data is mapped into tables in a relational database against which queries may be launched. It is therefore possible to retrieve events based on various attributes. Moreover, a variety of statistics can be computed on the event data. Thus, the presented system provides capabilities of a fully integrated smart system.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Pre-processing of Video Streams for Extracting Queryable Representation of Its Contents

Automatic Event Detection in User-Generated Video Content: A Survey

Online Aggregated-Event Representation for Multiple Event Detection in Videos

References

Adam A, Rivlin E, Shimshoni I, Reinitz D (2008) Robust real-time unusual event detection using multiple fixed-location monitors. IEEE Trans Pattern Anal Mach Intell 30(3):555–560
Article Google Scholar
Atsushi N, Hirokazu K, Shinsaku H, Seiji I (2002) Tracking multiple people using distributed vision systems. In: Proc. of IEEE international conf. on robotics and automation. Washington, DC, pp 2974–2981
Google Scholar
Black J, Ellis T, Makris D (2004) A hierarchical database for visual surveillance applications. In: Proc. of IEEE conf. on multimedia and expo. Taipei, vol 3, pp 1571–1574
Boiman O, Irani M (2005) Detecting irregularities in images and in video. In: Proc. of IEEE international conf. on computer vision. Beijing, China, pp 462–469
Google Scholar
Borg M, Thirde D, Ferryman J, Fusier F, Valentin V, Brémond F, Thonnat M (2005) Video surveillance for aircraft activity monitoring. In: Proc. of IEEE conf. on advanced video and signal based surveillance. Como, Italy, pp 16–21
Chapter Google Scholar
Brown LM (2004) View independent vehicle/person classification. In: Proc. of ACM international workshop on video surveillance and sensor networks. New York, USA, pp 114–123
Chapter Google Scholar
Cai Q, Aggarwal JK (1999) Tracking human motion in structured environments using a distributed camera system. IEEE Trans Pattern Anal Mach Intell 21(11):1241–1247
Article Google Scholar
Chang T-H, Gong S (2001) Tracking multiple people with a multi-camera system. In: Proc. of IEEE workshop on multi-object tracking. Vancouver, BC, pp 19–26
Chapter Google Scholar
Comaniciu D, Ramesh V, Meer P (2000) Real-time tracking of non-rigid objects using mean shift. In: Proc. of IEEE conf. on computer vision and pattern recognition. South Carolina, USA, pp 142–149
Google Scholar
Connell J, Senior AW, Hampapur A, Tian Y-L, Brown L, Pankanti S (2004) Detection and tracking in the IBM PeopleVision Ssystem. In: Proc. of the IEEE international conf. on multimedia and expo. Taipei, vol 2, pp 1403–1406
Cupillard F, Avanzi A, Bremond F, Thonnat M (2004) Video understanding for metro surveillance. In: Proc. of IEEE conf. on networking, sensing and control. Taipei, pp 186–191
Dee H, Hogg D (2004) Detecting inexplicable behaviour. In: Proc. of British machine vision conference. London, pp 477–486
Dee H, Hogg D (2004) Is it interesting? Comparing human and machine judgements on the pets dataset. In: Proc. of IEEE international workshop on performance evaluation of tracking and surveillance. Prague, pp 49–55
Hampapur A, Brown L, Connell J, Ekin A, Haas N, Lu M, Merkl H, Pankanti S, Senior A, Shu C-F, Tian Y-L (2005) Smart video surveillance: exploring the concept of multiscale spatiotemporal tracking. IEEE Signal Process Mag 22(2):38–51
Article Google Scholar
Haritaoglu I, Harwood D, Davis LS (2000) W⁴: real-time surveillance of people and their activities. IEEE Trans Pattern Anal Mach Intell 22(8):809–830
Article Google Scholar
Ivanov YA, Bobick AF (1999) Recognition of multi-agent interaction in video surveillance. In: Proc. of IEEE conf. on computer vision. Kerkyra, Greece, pp 169–176
Chapter Google Scholar
Kelly PH, Katkere A, Kuramura DY, Moezzi S, Chatterjee S, Jain R (1995) An architecture for multiple perspective interactive video. In: Proc. of ACM conf. on multimedia. San Francisco, California, pp 201–212
Chapter Google Scholar
Khan S, Shah M (2003) Consistent labeling of tracked objects in multiple cameras with overlapping fields of view. IEEE Trans Pattern Anal Mach Intell 25(10):1355–1360
Article Google Scholar
Lee L, Romano R, Stein G (2000) Monitoring activities from multiple video streams: establishing a common coordinate frame. IEEE Trans Pattern Anal Mach Intell 22(8):758–767
Article Google Scholar
Martinez-Ponte I, Desurmont X, Meessen J, Delaigle J-F (2005) Robust human face hiding ensuring privacy. In: Proc. of international workshop on image analysis for multimedia interactive services. Montreux
Medioni G, Cohen I, Bremond F, Hongeng S, Nevatia R (2001) Event detection and analysis from video streams. IEEE Trans Pattern Anal Mach Intell 23(8):873–889
Article Google Scholar
Nevatia R, Hobbs J, Bolles B (2004) An ontology for video event representation. In: Proc. of IEEE conf. on computer vision and pattern recognition workshops. Washington, DC, pp 119
Google Scholar
Nevatia R, Zhao T, Hongeng S (2003) Hierarchical language-based representation of events in video streams. In: Proc. of IEEE CVPR workshop on event mining. Madison, Wisconsin, pp 39
Google Scholar
Owens J, Hunter A (2000) Application of the self-organising map to trajectory classification. In: Proc. of IEEE international workshop on visual surveillance. Dublin, Ireland, pp 77–83
Chapter Google Scholar
Porikli F, Haga T (2004) Event detection by eigenvector decomposition using object and frame features. In: Proc. of IEEE computer vision and pattern recognition workshops. Washington, DC, pp 462–469
Google Scholar
Rota N, Thonnat M (2000) Video sequence interpretation for visual surveillance. In: Proc. of IEEE international workshop on visual surveillance. Dublin, Ireland pp 59–68
Chapter Google Scholar
Sacchi C, Regazzoni CS (2000) A distributed surveillance system for detection of abandoned objects in unmanned railway environments. In: IEEE transactions on vehicular technology, vol 49(5) pp 2013–2026
Shet VD, Harwood D, Davis LS (2005) VidMAP: video monitoring of activity with prolog. In: Proc. of IEEE conf. on advanced video and signal based surveillance. Como, Italy, pp 224–229
Chapter Google Scholar
Stauffer C, Grimson WEL (1999) Learning patterns of activity using real time tracking. IEEE Trans Pattern Anal Mach Intell 22(8):809–830
Google Scholar
Stringa E, Regazzoni CS (1998) Content-based retrieval and real time detection from video sequences acquired by surveillance systems. In: Proc. of IEEE international conf. on image processing. Chicago, IL, pp 138–142
Google Scholar
Stringa E, Regazzoni CS (2000) Real-time video-shot detection for scene survaillance applications. IEEE Trans Image Process 9(1):69–79
Article Google Scholar
Tian Y-L, Lu M, Hampapur A (2005) Robust and efficient foreground analysis for real-time video surveillance. In: Proc. of the IEEE conf. on computer vision and pattern recognition, vol 1. San Diego, CA, pp 1182–1187
Google Scholar
Vaswani N, Chowdhury A, Chellappa R (2003) Activity recognition using the dynamics of the configuration of interacting objects. In: Proc. of IEEE conf. on computer vision and pattern recognition. Madison, Wisconsin, pp 633–640
Google Scholar
Velipasalar S, Brown LM, Hampapur A (2006) Specifying, interpreting and detecting high-level, spatio-temporal composite events in single and multi-camera systems. In: Proc. of international workshop on Semantic Learning Applications in Multimedia (SLAM) in conjunction with IEEE CVPR. New York, USA, pp 110–117
Google Scholar
Vu V-T, Bremond F, Thonnat M (2003) Automatic video interpretation: a novel algorithm for temporal scenario recognition, In: Proc. of international joint conf. on artificial intelligence. Mexico
Watanabe H, Tanahashi H, Satoh Y, Niwa Y, Yamamoto K (2003) Event detection for a visual surveillance system using stereo omni-directional system. In: International conf. on knowledge-based intelligent information and engineering systems. Oxford, UK, pp 890–896
Google Scholar
Wolf W (1997) Hidden Markov model parsing of video programs. In: Proc. of IEEE international conf. on acoustics, speech, and signal processing. Munich, Germany, pp 2609–2611
Google Scholar
Xiang T, Gong S (2005) Video behaviour profiling and abnormality detection without manual labelling. In: Proc. of IEEE international conf. computer vision. Beijing, China, pp 1238–1245
Google Scholar
Xiang T, Gong S (2006) Beyond tracking: modelling activity and understanding behaviour. Int J Comput Vis 67(1):21–51
Article Google Scholar
Zhong H, Shi J, Visontai M (2004) Detecting unusual activity in video. In: Proc. of IEEE conf. on computer vision and pattern recognition. Washington, DC, pp 819–826
Google Scholar

Download references

Author information

Authors and Affiliations

Dept. of Electrical Engineering, University of Nebraska-Lincoln, 209N WSEC, Lincoln, NE, 68588, USA
Senem Velipasalar
IBM T.J. Watson Research Center, Hawthorne, NY, 10532, USA
Lisa M. Brown & Arun Hampapur

Authors

Senem Velipasalar
View author publications
You can also search for this author in PubMed Google Scholar
Lisa M. Brown
View author publications
You can also search for this author in PubMed Google Scholar
Arun Hampapur
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Senem Velipasalar.

Additional information

This work was supported in part by the National Science Foundation under grant CNS-0834753 and the EPSCoR First Award.

Appendix

The XML file created for the example scenario, described in Section 4.1, is shown below. The proposed format of the XML file will be explained by using this example. The outermost node of the created XML file is +<EVENT_SCENARIOS>. When this node is opened, we see the following:

$$ \begin{array}{l} \texttt{- <EVENT\_SCENARIOS>}\\ \texttt{+ <Event\_Scenario>}\\ \texttt{+ <Event\_Scenario>}\\ \texttt{</EVENT\_SCENARIOS>} \end{array} $$

There are two <Event_Scenario> nodes, one for the composite event described in Section 4.1, and one for a primitive event defined separately. When the first <Event_Scenario> node is expanded, we see:

The first entry in the <Event_Scenario> node is the OperationType, which is the SEQUENCE operator in this case. The next entry in the <Event_Scenario> node is a node called <Argument>, which represents the first argument of the SEQUENCE operation. This node is followed by another node called <Frame_Interval>. The last entry is another <Argument> node representing the second operand of the SEQUENCE operation. As seen above, when the <Argument> node is expanded, the first entry is the ArgumentType stating whether this argument is a composite event or primitive event. In this case, the argument is a PRIMITIVE event. If the argument was a COMPOSITE event instead, then the next entry would be the OperationType again, stating the type of the operation relating the arguments of that composite argument. If the ArgumentType is PRIMITIVE, then the next node is the <Individual_Alarm_Definition>, which includes all the information about the primitive event definition. The <Frame_Interval> node of the <Event_Scenario> has the minimum and maximum number of frames desired between the arguments of the SEQUENCE operation.

When the <Individual_Alarm_Definition> node is expanded, ViewID is the ID number of the camera view. It tells which camera view the event was defined on. ClassID is the name of the primitive event, which is one of the six primitives we have. Identity is the description given to the event. <ViewDim> node has the information about the image of the camera view, specifically its width, height, and the origin location. If the definition of a primitive event includes defining a ROI, this information is saved under the <ROIs> node. This node has a <Polygon> node for each ROI. Each <Polygon> node is composed of <Vertex> nodes which have the <X> and <Y> coordinates of the vertices. If a schedule is defined for a primitive event, this information is saved under the <Active> node. It has the month, day, date, hour and minute information for the start and end times of the scheduling. The other parameters of the primitive event are saved under the <Parameters>. For instance, it can have the vertices of the trip wires or the minimum and maximum detected object sizes. Finally, the <Response> node has the <Email> and <Command> entries. <Email> holds the information about where to send an e-mail, and <Command> is the command to execute when this event happens.

The second <Event_ Scenario> node in our example XML file has the information about an “Abandoned Object” primitive event defined separately. When an event scenario is only a primitive event, the <OperationType> is NONE, and there is only one <Argument>.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Velipasalar, S., Brown, L.M. & Hampapur, A. Detection of user-defined, semantically high-level, composite events, and retrieval of event queries. Multimed Tools Appl 50, 249–278 (2010). https://doi.org/10.1007/s11042-010-0489-z

Download citation

Published: 06 March 2010
Issue Date: October 2010
DOI: https://doi.org/10.1007/s11042-010-0489-z

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Detection of user-defined, semantically high-level, composite events, and retrieval of event queries

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Pre-processing of Video Streams for Extracting Queryable Representation of Its Contents

Automatic Event Detection in User-Generated Video Content: A Survey

Online Aggregated-Event Representation for Multiple Event Detection in Videos

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Appendix

Appendix

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now