skip to main content
10.1145/2733373.2806221acmconferencesArticle/Chapter ViewAbstractPublication PagesmmConference Proceedingsconference-collections
research-article

EventNet: A Large Scale Structured Concept Library for Complex Event Detection in Video

Published: 13 October 2015 Publication History

Abstract

Event-specific concepts are the semantic concepts specifically designed for the events of interest, which can be used as a mid-level representation of complex events in videos. Existing methods only focus on defining event-specific concepts for a small number of pre-defined events, but cannot handle novel unseen events. This motivates us to build a large scale event-specific concept library that covers as many real-world events and their concepts as possible. Specifically, we choose WikiHow, an online forum containing a large number of how-to articles on human daily life events. We perform a coarse-to-fine event discovery process and discover 500 events from WikiHow articles. Then we use each event name as query to search YouTube and discover event-specific concepts from the tags of returned videos. After an automatic filter process, we end up with 95,321 videos and 4,490 concepts. We train a Convolutional Neural Network (CNN) model on the 95,321 videos over the 500 events, and use the model to extract deep learning feature from video content. With the learned deep learning feature, we train 4,490 binary SVM classifiers as the event-specific concept library. The concepts and events are further organized in a hierarchical structure defined by WikiHow, and the resultant concept library is called EventNet. Finally, the EventNet concept library is used to generate concept based representation of event videos. To the best of our knowledge, EventNet represents the first video event ontology that organizes events and their concepts into a semantic structure. It offers great potential for event retrieval and browsing. Extensive experiments over the zero-shot event retrieval task when no training samples are available show that the proposed EventNet concept library consistently and significantly outperforms the state-of-the-art (such as the 20K ImageNet concepts trained with CNN) by a large margin up to 207%. We will also show that EventNet structure can help users find relevant concepts for novel event queries that cannot be well handled by conventional text based semantic analysis alone. The unique two-step approach of first applying event detection models followed by detection of event-specific concepts also provides great potential to improve the efficiency and accuracy of Event Recounting since only a very small number of event-specific concept classifiers need to be fired after event detection.

References

[1]
http://www.nist.gov/itl/iad/mig/med.cfm.
[2]
http://www.wikihow.com/Main-Page.
[3]
T. Berg, A. Berg, and J. Shih. Automatic Attribute Discovery and Characterization from Noisy Web Data. In ECCV, 2010.
[4]
J. Chen, Y. Cui, G. Ye, D. Liu, and S.-F. Chang. Event-Driven Semantic Concept Discovery by Exploiting Weakly Tagged Internet Images. In ICMR, 2014.
[5]
Y. Cui, D. Liu, J. Chen, and S.-F. Chang. Building A Large Concept Bank for Representing Events in Video. arXiv:1403.7591, 2014.
[6]
L. Duan, D. Xu, I. W. Tsang, and J. Luo. Visual Event Recognition in Videos by Learning from Web Data. In CVPR, 2010.
[7]
J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and Li Fei-Fei. ImageNet: A Large-Scale Hierarchical Image Database. In CVPR, 2010.
[8]
C. Fellbaum. WordNet: An Electronic Lexical Database. Bradford Books, 1998.
[9]
A. Habibian, T. Mensink, and C. Snoek. Composite Concept Discovery for Zero-Shot Video Event Detection. In ICMR, 2014.
[10]
L. Han, A. Kashyap, T. Finin, J. Mayfield, and Jonathan Weese. UMBC EBIQUITY-CORE: Semantic Textual Similarity Systems. In ACL, 2013.
[11]
A. Krizhevsky, I. Sutskever, and G. Hinton. ImageNet Classification with Deep Convolutional Neural Networks. In NIPS, 2012.
[12]
I. Laptev and T. Lindeberg. Space-Time Interest Points. In ICCV, 2003.
[13]
I. Laptev, M. Marszalek, C. Schmid and B. Rozenfeld. Learning Realistic Human Actions from Movies. In CVPR, 2008.
[14]
H. Liu and P. Singh. ConceptNet: A Pratical Commonsence Reasoning Toolkit. BT Technology Journal, 2004.
[15]
J. Liu, H. Cheng, O. Javed, Q. Yu, I. Chakraborty, W. Zhang, A. Divakaran, H. Sawhney, J. Allan, R. Manmatha, J. Foley, M. Shah, A. Dehghan, M. Witbrock, J. Curtis, and G. Friedland. SRI-Sarnoff AURORA System at TRECVID 2013 Multimedia Event Detection and Recounting. In NIST TRECVID Workshop, 2013.
[16]
J. Liu, Q. Yu, O. Javed, S. Ali, A. Tamrakar, A. Divakaran, H. Cheng, and H. Sawhney. Video Event Recognition Using Concept Attributes. In WACV, 2013.
[17]
K.-T. Lai, D. Liu, M.-S. Chen, and S.-F. Chang, Recognizing Complex Events in Videos by Learning Key Static-Dynamic Evidences. In ECCV, 2014.
[18]
Y. Jia. Caffe: An Open Source Convolutional Archtecture for Fast Feature Embedding. http://caffe.berkeleyvision.org, 2013.
[19]
M. Jain, J. Gemert and C. Snoek. University of Amsterdam at THUMOS Challenge 2014. In Thumos Challenge, 2014.
[20]
Y.-G. Jiang, Z. Wu, J. Wang, X. Xue, and S.-F. Chang. Exploiting Feature and Class Relationships in Video Categorization with Regularized Deep Neural Networks. arXiv:1502.07209, 2015.
[21]
Y.-G. Jiang, G. Ye, S.-F. Chang, D. Ellis, and A. C. Loui. Consumer Video Understanding: A Benchmark Database and An Evaluation of Human and Machine Performance. In ICMR, 2011.
[22]
M. Mazloom, E. Gavves, K. Sande, and C. Snoek. Searching Informative Concept Banks for Video Event Detection. In ICMR, 2013.
[23]
G. Patterson and J. Hays. Sun Attribute Database: Discovering, Annotating, and Recognizing Scene Attributes. In CVPR, 2012.
[24]
A. Ritter, Mausam, O. Etzioni, and S. Clark. Open Domain Event Extraction from Twitter. In KDD, 2012.
[25]
K. Reddy and M. Shah, Recognizing 50 Human Action Categories of Web Videos. Journal of Machine Vision and Applications, 2012.
[26]
M. Rastegari, A. Diba, D. Parikh, and A. Farhadi. Multi-Attribute Queries: To Merge or Not To Merge? In CVPR, 2013.
[27]
O. Russakovsky, J. Deng, H. Su, J. Krause, S. Ma, Z. Huang, A. Karpathy, A. Khosla, M. Bernstein, Al. Berg and Li Fei-Fei. ImageNet Large Scale Visual Recognition Challenge. arXiv:1409.0575, 2014.
[28]
K. Saenko, B. Kulis, M. Fritz, and T. Darrell. Adapting Visual Category Models to New Domains. In ECCV, 2010.
[29]
K. Soomro, A. Zamir, and M. Shah. UCF101: A Dataset of 101 Human Action Classes From Videos in The Wild. CRCV-TR, 2012.
[30]
S. Sadanand and J. Corso. Action Bank: A High-level Representation of Activity in Video. In CVPR, 2012.
[31]
L. Torresani, M. Szummer, and A. Fitzgibbon. Efficient Object Category Recognition Using Classemes. In ECCV, 2010.
[32]
S. Wu, S. Bondugula, F. Luisier, X. Zhuang, and P. Natarajan. Zero-shot Event Detection using Multi-modal Fusion of Weakly Supervised Concepts. In CVPR, 2014.
[33]
H. Xu, G. Ye, Y. Li, D. Liu, and S.-F. Chang. Large Video Event Ontology Browsing, Search and Tagging (EventNet Demo). In MM, 2015.
[34]
Z. Xu, Z. Xu, I. Tsang, Y. Yang, Z. Ma, and A. Hauptmann. Event Detection using Multi-Level Relevance Labels and Multiple Features. In CVPR, 2014.

Cited By

View all
  • (2024)A Systematic Review of Event-Matching Methods for Complex Event Detection in Video StreamsSensors10.3390/s2422723824:22(7238)Online publication date: 13-Nov-2024
  • (2024)A Survey of Video Datasets for Grounded Event Understanding2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW)10.1109/CVPRW63382.2024.00727(7314-7327)Online publication date: 17-Jun-2024
  • (2024)Multimodal semantic enhanced representation network for micro-video event detectionKnowledge-Based Systems10.1016/j.knosys.2024.112255301(112255)Online publication date: Oct-2024
  • Show More Cited By

Index Terms

  1. EventNet: A Large Scale Structured Concept Library for Complex Event Detection in Video

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    MM '15: Proceedings of the 23rd ACM international conference on Multimedia
    October 2015
    1402 pages
    ISBN:9781450334594
    DOI:10.1145/2733373
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 13 October 2015

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. concept based video representation
    2. multimedia event detection
    3. structured concept ontology
    4. zero-shot retrieval

    Qualifiers

    • Research-article

    Funding Sources

    • Intelligence Advanced Research Projects Activity (IARPA) via Department of Interior National Business Center

    Conference

    MM '15
    Sponsor:
    MM '15: ACM Multimedia Conference
    October 26 - 30, 2015
    Brisbane, Australia

    Acceptance Rates

    MM '15 Paper Acceptance Rate 56 of 252 submissions, 22%;
    Overall Acceptance Rate 2,145 of 8,556 submissions, 25%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)23
    • Downloads (Last 6 weeks)4
    Reflects downloads up to 20 Feb 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)A Systematic Review of Event-Matching Methods for Complex Event Detection in Video StreamsSensors10.3390/s2422723824:22(7238)Online publication date: 13-Nov-2024
    • (2024)A Survey of Video Datasets for Grounded Event Understanding2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW)10.1109/CVPRW63382.2024.00727(7314-7327)Online publication date: 17-Jun-2024
    • (2024)Multimodal semantic enhanced representation network for micro-video event detectionKnowledge-Based Systems10.1016/j.knosys.2024.112255301(112255)Online publication date: Oct-2024
    • (2024)Crowd behavior detection: leveraging video swin transformer for crowd size and violence level analysisApplied Intelligence10.1007/s10489-024-05775-6Online publication date: 26-Aug-2024
    • (2023)Temporal Dynamic Concept Modeling Network for Explainable Video Event RecognitionACM Transactions on Multimedia Computing, Communications, and Applications10.1145/356831219:6(1-22)Online publication date: 12-Jul-2023
    • (2023)Surch: Enabling Structural Search and Comparison for Surgical VideosProceedings of the 2023 CHI Conference on Human Factors in Computing Systems10.1145/3544548.3580772(1-17)Online publication date: 19-Apr-2023
    • (2023)Video Label Enhancing and Standardization through Transcription and WikiId Mapping Techniques2023 11th International Symposium on Electronic Systems Devices and Computing (ESDC)10.1109/ESDC56251.2023.10149851(1-7)Online publication date: 4-May-2023
    • (2023)In Defense of Structural Symbolic Representation for Video Event-Relation Prediction2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW)10.1109/CVPRW59228.2023.00522(4940-4950)Online publication date: Jun-2023
    • (2022)A Survey of Data Representation for Multi-Modality Event Detection and EvolutionApplied Sciences10.3390/app1204220412:4(2204)Online publication date: 20-Feb-2022
    • (2022)Zero-Shot Video Event Detection With High-Order Semantic Concept Discovery and MatchingIEEE Transactions on Multimedia10.1109/TMM.2021.307362424(1896-1908)Online publication date: 2022
    • Show More Cited By

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media