A Self-Referential Perceptual Inference Framework for Video Interpretation

Town, Christopher; Sinclair, David

doi:10.1007/3-540-36592-3_6

Christopher Town⁸ &
David Sinclair⁹

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 2626))

Included in the following conference series:

International Conference on Computer Vision Systems

701 Accesses

Abstract

This paper presents an extensible architectural model for general content-based analysis and indexing of video data which can be customised for a given problem domain. Video interpretation is approached as a joint inference problems which can be solved through the use of modern machine learning and probabilistic inference techniques. An important aspect of the work concerns the use of a novel active knowledge representation methodology based on an ontological query language. This representation allows one to pose the problem of video analysis in terms of queries expressed in a visual language incorporating prior hierarchical knowledge of the syntactic and semantic structure of entities, relationships, and events of interest occurring in a video sequence. Perceptual inference then takes place within an ontological domain defined by the structure of the problem and the current goal set.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

VideoGraph – Towards Using Knowledge Graphs for Interactive Video Retrieval

Visual Video Analytics for Interactive Video Content Analysis

Self-localization and Navigation in Dynamic Search Hierarchy for Video Retrieval Interface

References

M. Addlesee, R. Curwen, S. Hodges, J. Newman, P. Steggles, A. Ward, and A. Hopper. Implementing a sentient computing system. IEEE Computer, 34(8):50–56, 2001.
Google Scholar
K. Barnard and D. Forsyth. Learning the semantics of words and pictures. In Proc. International Conference on Computer Vision, 2001.
Google Scholar
A. Bobick and Y. Ivanov. Action recognition using probabilistic parsing. In Proc. Conference on Computer Vision and Pattern Recognition, 1998.
Google Scholar
H. Bunke and D. Pasche. Structural Pattern Analysis, chapter Parsing multivalued strings and its application to image and waveform recognition. World Scientific Publishing, 1990.
Google Scholar
H. Buxton and S. Gong. Advanced visual surveillance using bayesian networks. In Proc. International Conference on Computer Vision, 1995.
Google Scholar
H. Buxton and N. Walker. Query based visual analysis: Spatio-temporal reasoning in computer vision. Vision Computing, 6(4):247–254, 1988.
Article Google Scholar
Y. Chen, Y. Rui, and T. Huang. JPDAF based HMM for real-time contour tracking. In Proc. Conference on Computer Vision and Pattern Recognition, 2001.
Google Scholar
J. Crowley, J. Coutaz, and F. Berard. Things that see: Machine perception for human computer interaction. Communications of the ACM, 43(3):54–64, 2000.
Article Google Scholar
J. Crowley, J. Coutaz, G. Rey, and P. Reignier. Perceptual components for context aware computing. In Proc. Ubicomp 2002, 2002.
Google Scholar
J. Crowley and Y. Demazeau. Principles and techniques for sensor data fusion. Signal Processing, 32(1–2):5–27, 1993.
Article Google Scholar
T. Darrell, G. Gordon, M. Harville, and J. Woodfill. Integrated person tracking using stereo, color, and pattern detection. In Proc. Conference on Computer Vision and Pattern Recognition, 1998.
Google Scholar
D. C. Dennett. Minds, machines, and evolution, chapter Cognitive Wheels: The Frame Problem of AI, pages 129–151. Cambridge University Press, 1984.
Google Scholar
B. Draper, U. Ahlrichs, and D. Paulus. Adapting object recognition across domains: A demonstration. Lecture Notes in Computer Science, 2095:256–270, 2001.
Google Scholar
P. Duygulu, K. Barnard, J.F.H. De Freitas, and D.A. Forsyth. Object recognition as machine translation: Learning a lexicon for a fixed image vocabulary. In Proc. European Conference on Computer Vision, 2002.
Google Scholar
J. Glicksman. A cooperative scheme for image understanding using multiple sources of information. Technical Report TR-82-13, University of British Columbia, Department of Computer Science, 1982.
Google Scholar
S. Harnad. The symbol grounding problem. Physica D, 42:335–346, 1990.
Article Google Scholar
A. Harter, A. Hopper, P. Steggles, A. Ward, and P. Webster. The anatomy of a context-aware application. In Mobile Computing and Networking, pages 59–68, 1999.
Google Scholar
G. Herzog and K. Rohr. Integrating vision and language: Towards automatic description of human movements. In I. Wachsmuth, C.-R. Rollinger, and W. Brauer, editors, KI-95: Advances in Artificial Intelligence. 19th Annual German Conference on Artificial Intelligence, pages 257–268. Springer, 1995.
Google Scholar
S. Intille and A. Bobick. Representation and visual recognition of complex, multiagent actions using belief networks. In IEEE Workshop on the Interpretation of Visual Motion, 1998.
Google Scholar
M. Isard and A. Blake. ICONDENSATION: Unifying low-level and high-level tracking in a stochastic framework. Lecture Notes in Computer Science, 1406, 1998.
Google Scholar
Y. Ivanov and A. Bobick. Recognition of visual activities and interactions by stochastic parsing. IEEE Trans. on Pattern Analysis and Machine Intell., 22(8), 2000.
Google Scholar
A. Jaimes and S. Chang. A conceptual framework for indexing visual information at multiple levels. In IS&T SPIE Internet Imaging, 2000.
Google Scholar
F.V. Jensen. An Introduction to Bayesian Networks. Springer Verlag, 1996.
Google Scholar
A. Kojima, T. Tamura, and K. Fukunaga. Natural language description of human activities from video images based on concept hierarchy of actions. Int. Journal of Computer Vision (to appear), 2002.
Google Scholar
D. Moore and I. Essa. Recognizing multitasked activities using stochastic context-free grammar. In Proc. Workshop on Models vs Exemplars in Computer Vision, 2001.
Google Scholar
N. Oliver, B. Rosario, and A. Pentland. A bayesian computer vision system for modeling human interactions. IEEE Trans. on Pattern Analysis and Machine Intell., 22(8):831–843, 2000.
Article Google Scholar
C. Pinhanez and A. Bobick. Approximate world models: Incorporating qualitative and linguistic information into vision systems. In AAAI’96, 1996.
Google Scholar
R. Rimey. Control of Selective Perception using Bayes Nets and Decision Theory. PhD thesis, University of Rochester Computer Science Department, 1993.
Google Scholar
J. Sherrah and S. Gong. Tracking discontinuous motion using bayesian inference. In Proc. European Conference on Computer Vision, pages 150–166, 2000.
Google Scholar
J. Sherrah and S. Gong. Continuous global evidence-based bayesian modality fusion for simultaneous tracking of multiple objects. In Proc. International Conference on Computer Vision, 2001.
Google Scholar
P. Smith. Edge-based Motion Segmentation. PhD thesis, Cambridge University Engineering Department, 2001.
Google Scholar
K. Sparck Jones. Information retrieval and artificial intelligence. Artificial Intelligence, 114: 257–281, 1999.
Article MATH Google Scholar
M. Spengler and B. Schiele. Towards robust multi-cue integration for visual tracking. Lecture Notes in Computer Science, 2095:93–106, 2001.
Article Google Scholar
R. Srihari. Computational models for integrating linguistic and visual information: A survey. Artificial Intelligence Review, special issue on Integrating Language and Vision, 8:349–369, 1995.
Google Scholar
S. Stillman and I. Essa. Towards reliable multimodal sensing in aware environments. In Proc. Perceptual User Interfaces Workshop, ACM UIST 2001, 2001.
Google Scholar
M. Thonnat and N. Rota. Image understanding for visual surveillance applications. In Proc. of 3rd Int. Workshop on Cooperative Distributed Vision, 1999.
Google Scholar
C.P. Town and D.A. Sinclair. Ontological query language for content based image retrieval. In Proc. IEEE Workshop on Content-based Access of Image and Video Libraries, pages 75–81, 2001.
Google Scholar
K. Toyama and E. Horvitz. Bayesian modality fusion: Probabilistic integration of multiple vision algorithms for head tracking. In Proc. Asian Conference on Computer Vision, 2000.
Google Scholar
W. Tsai and K. Fu. Attributed grammars — a tool for combining syntactic and statistical approaches to pattern recognition. IEEE Transactions on Systems, Man and Cybernetics, SMC-10(12), 1980.
Google Scholar
J. Tsotsos, J. Mylopoulos, H. Covvey, and S. Zucker. A framework for visual motion understanding. IEEE Trans. on Pattern Analysis and Machine Intell., Special Issue on Computer Analysis of Time-Varying Imagery:563–573, 1980.
Google Scholar
Y. Wu and T. Huang. A co-inference approach to robust visual tracking. In Proc. International Conference on Computer Vision, 2001.
Google Scholar

Download references

Author information

Authors and Affiliations

University of Cambridge Computer Laboratory, 15 JJ Thomson Avenue, Cambridge, CB3 0FD, UK
Christopher Town
Waimara Ltd, 115 Ditton Walk, Cambridge, UK
David Sinclair

Authors

Christopher Town
View author publications
You can also search for this author in PubMed Google Scholar
David Sinclair
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

INRIA Rhône-Alpes, 655 Ave de l’Europe, 38330, Montbonnot, France
James L. Crowley
Montefiore Institute, University of Liège, 4000, Liège Sart-Tilman, Belgium
Justus H. Piater
Automation and Control Institute, Vienna University of Technology, Gusshausstraße 27/376, 1040, Vienna, Austria
Markus Vincze
Institute of Digital Image Processing, Joanneum Research, Wastiangasse 6, 8010, Graz, Austria
Lucas Paletta

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Town, C., Sinclair, D. (2003). A Self-Referential Perceptual Inference Framework for Video Interpretation. In: Crowley, J.L., Piater, J.H., Vincze, M., Paletta, L. (eds) Computer Vision Systems. ICVS 2003. Lecture Notes in Computer Science, vol 2626. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-36592-3_6

Download citation

DOI: https://doi.org/10.1007/3-540-36592-3_6
Published: 14 March 2003
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-00921-4
Online ISBN: 978-3-540-36592-1
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics