skip to main content
10.1145/2522848.2522883acmconferencesArticle/Chapter ViewAbstractPublication Pagesicmi-mlmiConference Proceedingsconference-collections
poster

A Markov logic framework for recognizing complex events from multimodal data

Published: 09 December 2013 Publication History

Abstract

We present a general framework for complex event recognition that is well-suited for integrating information that varies widely in detail and granularity. Consider the scenario of an agent in an instrumented space performing a complex task while describing what he is doing in a natural manner. The system takes in a variety of information, including objects and gestures recognized by RGB-D and descriptions of events extracted from recognized and parsed speech. The system outputs a complete reconstruction of the agent's plan, explaining actions in terms of more complex activities and filling in unobserved but necessary events. We show how to use Markov Logic (a probabilistic extension of first-order logic) to create a model in which observations can be partial, noisy, and refer to future or temporally ambiguous events; complex events are composed from simpler events in a manner that exposes their structure for inference and learning; and uncertainty is handled in a sound probabilistic manner. We demonstrate the effectiveness of the approach for tracking kitchen activities in the presence of noisy and incomplete observations.

References

[1]
J. Allen, M. Swift, and W. de Beaumont. Deep semantic analysis of text. In Proc. Semantics in Text Processing, STEP '08, pages 343--354, 2008.
[2]
J. F. Allen. Maintaining knowledge about temporal intervals. Communications of the ACM, 26(11):832--843, Nov. 1983.
[3]
C. Ansótegui, M. L. Bonet, and J. Levy. Sat-based maxsat algorithms. Artifical Intelligence, 196, 2013.
[4]
S. Blackman. Multiple-target tracking with radar applications. Artech House radar library. Artech House, 1986.
[5]
W. Brendel, A. Fern, and S. Todorovic. Probabilistic event logic for interval-based event recognition. In 24th IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2011), pages 3329--3336, 2011.
[6]
H. H. Bui. A general model for online probabilistic plan recognition. In Eighteenth International Joint Conference on Artificial Intelligence (IJCAI-2003), 2003.
[7]
D. Comaniciu, V. Ramesh, and P. Meer. Kernel-based object tracking. IEEE Trans. Pattern Anal. Mach. Intell., 25(5):564--575, May 2003.
[8]
N. Dalal and B. Triggs. Histograms of oriented gradients for human detection. In Proceedings of the 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05) - Volume 1 - Volume 01, CVPR '05, pages 886--893, Washington, DC, USA, 2005. IEEE Computer Society.
[9]
S. Gupta and R. J. Mooney. Using closed captions as supervision for video activity recognition. In Twenty-Fourth AAAI Conference on Artificial Intelligence (AAAI-2010), 2010.
[10]
H. Kautz. A formal theory of plan recognition and its implementation. In J. Allen, H. Kautz, R. Pelavin, and J. Tennenberg, editors, Reasoning About Plans, pages 69--126. Morgan Kaufmann Publishers, 1991.
[11]
A. Kembhavi, T. Yeh, and L. Davis. Why did the person cross the road (there)? scene understanding using probabilistic logic models and common sense reasoning. In 11th European Conference on Computer Vision (EECV 2010), 2010.
[12]
J. Lei, X. Ren, and D. Fox. Fine-grained kitchen activity recognition using rgb-d. In Proceedings of the 2012 ACM Conference on Ubiquitous Computing (UbiComp '12), pages 208--211, 2012.
[13]
C. Matuszek, N. FitzGerald, L. S. Zettlemoyer, L. Bo, and D. Fox. A joint model of language and perception for grounded attribute learning. In 29th International Conference on Machine Learning (ICML 2012), 2012.
[14]
D. Moore and I. Essa. Recognizing multitasked activities using stochastic context-free grammar. In In Proceedings of AAAI Conference, 2001.
[15]
V. I. Morariu and L. S. Davis. Multi-agent event recognition in structured scenarios. In 24th IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2011), 2011.
[16]
S. Natarajan, H. H. Bui, P. Tadepalli, K. Kersting, and W. Wong. Logical hierarchical hidden Markov models for modeling user activities. In In Proc. of ILP-08, 2008.
[17]
F. Niu, C. Ré, A. Doan, and J. W. Shavlik. Tuffy: Scaling up statistical inference in markov logic networks using an rdbms. Proceedings of the VLDB Endowment (PVLDB), 4(6):373--384, 2011.
[18]
M. Richardson and P. Domingos. Markov logic networks. Mach. Learn., 62(1--2):107--136, 2006.
[19]
C. F. Schmidt, N. S. Sridharan, and J. L. Goodson. The plan recognition problem: An intersection of psychology and artificial intelligence. Artifical Intelligence, 11(1--2), 1978.
[20]
Y. Shi, Y. Huang, D. Minnen, A. Bobick, and I. Essa. Propagation Networks for Recognition of Partially Ordered Sequential Action. In Proceedings of IEEE CVPR04, 2004.
[21]
P. Singla and R. J. Mooney. Abductive Markov Logic for plan recognition. In Proceedings of the Twenty-Fifth AAAI Conference on Artificial Intelligence (AAAI 2011), 2011.
[22]
M. Swift, G. Ferguson, L. Galescu, Y. Chu, C. Harman, H. Jung, I. Perera, Y. Song, J. Allen, and H. Kautz. A multimodal corpus for integrated language and action. In Proc. of the Int. Workshop on MultiModal Corpora for Machine Learning, 2012.
[23]
S. Tran and L. Davis. Visual event modeling and recognition using markov logic networks. In 10th European Conference on Computer Vision (EECV 2008), 2008.
[24]
P. Viola and M. J. Jones. Robust real-time face detection. Int. J. Comput. Vision, 57(2):137--154, May 2004.

Cited By

View all
  • (2020)MIFTel: a multimodal interactive framework based on temporal logic rulesMultimedia Tools and Applications10.1007/s11042-019-08590-1Online publication date: 31-Jan-2020
  • (2020)High Level Video Event Modeling, Recognition and Reasoning via Petri NetArtificial Intelligence and Robotics10.1007/978-3-030-56178-9_6(69-90)Online publication date: 11-Nov-2020
  • (2019)High-Level Video Event Modeling, Recognition, and Reasoning via Petri NetIEEE Access10.1109/ACCESS.2019.29364937(129376-129386)Online publication date: 2019
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
ICMI '13: Proceedings of the 15th ACM on International conference on multimodal interaction
December 2013
630 pages
ISBN:9781450321297
DOI:10.1145/2522848
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 09 December 2013

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. markov logic
  2. multimodal interaction
  3. plan recognition

Qualifiers

  • Poster

Conference

ICMI '13
Sponsor:

Acceptance Rates

ICMI '13 Paper Acceptance Rate 49 of 133 submissions, 37%;
Overall Acceptance Rate 453 of 1,080 submissions, 42%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)2
  • Downloads (Last 6 weeks)0
Reflects downloads up to 01 Mar 2025

Other Metrics

Citations

Cited By

View all
  • (2020)MIFTel: a multimodal interactive framework based on temporal logic rulesMultimedia Tools and Applications10.1007/s11042-019-08590-1Online publication date: 31-Jan-2020
  • (2020)High Level Video Event Modeling, Recognition and Reasoning via Petri NetArtificial Intelligence and Robotics10.1007/978-3-030-56178-9_6(69-90)Online publication date: 11-Nov-2020
  • (2019)High-Level Video Event Modeling, Recognition, and Reasoning via Petri NetIEEE Access10.1109/ACCESS.2019.29364937(129376-129386)Online publication date: 2019
  • (2018)Tensorize, Factorize and Regularize: Robust Visual Relationship Learning2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition10.1109/CVPR.2018.00112(1014-1023)Online publication date: Jun-2018
  • (2018)Complex Event Processing Under Uncertainty Using Markov Chains, Constraints, and SamplingRules and Reasoning10.1007/978-3-319-99906-7_10(147-163)Online publication date: 24-Aug-2018
  • (2017)Probabilistic Complex Event RecognitionACM Computing Surveys10.1145/311780950:5(1-31)Online publication date: 26-Sep-2017
  • (2017)Declarative Reasoning about Space and Motion with VideoKI - Künstliche Intelligenz10.1007/s13218-017-0504-x31:4(321-330)Online publication date: 20-Aug-2017
  • (2017)Multimodal Gesture Recognition via Multiple Hypotheses RescoringGesture Recognition10.1007/978-3-319-57021-1_16(467-496)Online publication date: 20-Jul-2017
  • (2016)Statistical Relational Artificial Intelligence: Logic, Probability, and ComputationSynthesis Lectures on Artificial Intelligence and Machine Learning10.2200/S00692ED1V01Y201601AIM03210:2(1-189)Online publication date: 24-Mar-2016
  • (2016)Intelligent Biohazard Training Based on Real-Time Task RecognitionACM Transactions on Interactive Intelligent Systems10.1145/28836176:3(1-32)Online publication date: 21-Sep-2016
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media