Abstract
This paper presents an extensible architectural model for general content-based analysis and indexing of video data which can be customised for a given problem domain. Video interpretation is approached as a joint inference problems which can be solved through the use of modern machine learning and probabilistic inference techniques. An important aspect of the work concerns the use of a novel active knowledge representation methodology based on an ontological query language. This representation allows one to pose the problem of video analysis in terms of queries expressed in a visual language incorporating prior hierarchical knowledge of the syntactic and semantic structure of entities, relationships, and events of interest occurring in a video sequence. Perceptual inference then takes place within an ontological domain defined by the structure of the problem and the current goal set.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
M. Addlesee, R. Curwen, S. Hodges, J. Newman, P. Steggles, A. Ward, and A. Hopper. Implementing a sentient computing system. IEEE Computer, 34(8):50–56, 2001.
K. Barnard and D. Forsyth. Learning the semantics of words and pictures. In Proc. International Conference on Computer Vision, 2001.
A. Bobick and Y. Ivanov. Action recognition using probabilistic parsing. In Proc. Conference on Computer Vision and Pattern Recognition, 1998.
H. Bunke and D. Pasche. Structural Pattern Analysis, chapter Parsing multivalued strings and its application to image and waveform recognition. World Scientific Publishing, 1990.
H. Buxton and S. Gong. Advanced visual surveillance using bayesian networks. In Proc. International Conference on Computer Vision, 1995.
H. Buxton and N. Walker. Query based visual analysis: Spatio-temporal reasoning in computer vision. Vision Computing, 6(4):247–254, 1988.
Y. Chen, Y. Rui, and T. Huang. JPDAF based HMM for real-time contour tracking. In Proc. Conference on Computer Vision and Pattern Recognition, 2001.
J. Crowley, J. Coutaz, and F. Berard. Things that see: Machine perception for human computer interaction. Communications of the ACM, 43(3):54–64, 2000.
J. Crowley, J. Coutaz, G. Rey, and P. Reignier. Perceptual components for context aware computing. In Proc. Ubicomp 2002, 2002.
J. Crowley and Y. Demazeau. Principles and techniques for sensor data fusion. Signal Processing, 32(1–2):5–27, 1993.
T. Darrell, G. Gordon, M. Harville, and J. Woodfill. Integrated person tracking using stereo, color, and pattern detection. In Proc. Conference on Computer Vision and Pattern Recognition, 1998.
D. C. Dennett. Minds, machines, and evolution, chapter Cognitive Wheels: The Frame Problem of AI, pages 129–151. Cambridge University Press, 1984.
B. Draper, U. Ahlrichs, and D. Paulus. Adapting object recognition across domains: A demonstration. Lecture Notes in Computer Science, 2095:256–270, 2001.
P. Duygulu, K. Barnard, J.F.H. De Freitas, and D.A. Forsyth. Object recognition as machine translation: Learning a lexicon for a fixed image vocabulary. In Proc. European Conference on Computer Vision, 2002.
J. Glicksman. A cooperative scheme for image understanding using multiple sources of information. Technical Report TR-82-13, University of British Columbia, Department of Computer Science, 1982.
S. Harnad. The symbol grounding problem. Physica D, 42:335–346, 1990.
A. Harter, A. Hopper, P. Steggles, A. Ward, and P. Webster. The anatomy of a context-aware application. In Mobile Computing and Networking, pages 59–68, 1999.
G. Herzog and K. Rohr. Integrating vision and language: Towards automatic description of human movements. In I. Wachsmuth, C.-R. Rollinger, and W. Brauer, editors, KI-95: Advances in Artificial Intelligence. 19th Annual German Conference on Artificial Intelligence, pages 257–268. Springer, 1995.
S. Intille and A. Bobick. Representation and visual recognition of complex, multiagent actions using belief networks. In IEEE Workshop on the Interpretation of Visual Motion, 1998.
M. Isard and A. Blake. ICONDENSATION: Unifying low-level and high-level tracking in a stochastic framework. Lecture Notes in Computer Science, 1406, 1998.
Y. Ivanov and A. Bobick. Recognition of visual activities and interactions by stochastic parsing. IEEE Trans. on Pattern Analysis and Machine Intell., 22(8), 2000.
A. Jaimes and S. Chang. A conceptual framework for indexing visual information at multiple levels. In IS&T SPIE Internet Imaging, 2000.
F.V. Jensen. An Introduction to Bayesian Networks. Springer Verlag, 1996.
A. Kojima, T. Tamura, and K. Fukunaga. Natural language description of human activities from video images based on concept hierarchy of actions. Int. Journal of Computer Vision (to appear), 2002.
D. Moore and I. Essa. Recognizing multitasked activities using stochastic context-free grammar. In Proc. Workshop on Models vs Exemplars in Computer Vision, 2001.
N. Oliver, B. Rosario, and A. Pentland. A bayesian computer vision system for modeling human interactions. IEEE Trans. on Pattern Analysis and Machine Intell., 22(8):831–843, 2000.
C. Pinhanez and A. Bobick. Approximate world models: Incorporating qualitative and linguistic information into vision systems. In AAAI’96, 1996.
R. Rimey. Control of Selective Perception using Bayes Nets and Decision Theory. PhD thesis, University of Rochester Computer Science Department, 1993.
J. Sherrah and S. Gong. Tracking discontinuous motion using bayesian inference. In Proc. European Conference on Computer Vision, pages 150–166, 2000.
J. Sherrah and S. Gong. Continuous global evidence-based bayesian modality fusion for simultaneous tracking of multiple objects. In Proc. International Conference on Computer Vision, 2001.
P. Smith. Edge-based Motion Segmentation. PhD thesis, Cambridge University Engineering Department, 2001.
K. Sparck Jones. Information retrieval and artificial intelligence. Artificial Intelligence, 114: 257–281, 1999.
M. Spengler and B. Schiele. Towards robust multi-cue integration for visual tracking. Lecture Notes in Computer Science, 2095:93–106, 2001.
R. Srihari. Computational models for integrating linguistic and visual information: A survey. Artificial Intelligence Review, special issue on Integrating Language and Vision, 8:349–369, 1995.
S. Stillman and I. Essa. Towards reliable multimodal sensing in aware environments. In Proc. Perceptual User Interfaces Workshop, ACM UIST 2001, 2001.
M. Thonnat and N. Rota. Image understanding for visual surveillance applications. In Proc. of 3rd Int. Workshop on Cooperative Distributed Vision, 1999.
C.P. Town and D.A. Sinclair. Ontological query language for content based image retrieval. In Proc. IEEE Workshop on Content-based Access of Image and Video Libraries, pages 75–81, 2001.
K. Toyama and E. Horvitz. Bayesian modality fusion: Probabilistic integration of multiple vision algorithms for head tracking. In Proc. Asian Conference on Computer Vision, 2000.
W. Tsai and K. Fu. Attributed grammars — a tool for combining syntactic and statistical approaches to pattern recognition. IEEE Transactions on Systems, Man and Cybernetics, SMC-10(12), 1980.
J. Tsotsos, J. Mylopoulos, H. Covvey, and S. Zucker. A framework for visual motion understanding. IEEE Trans. on Pattern Analysis and Machine Intell., Special Issue on Computer Analysis of Time-Varying Imagery:563–573, 1980.
Y. Wu and T. Huang. A co-inference approach to robust visual tracking. In Proc. International Conference on Computer Vision, 2001.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2003 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Town, C., Sinclair, D. (2003). A Self-Referential Perceptual Inference Framework for Video Interpretation. In: Crowley, J.L., Piater, J.H., Vincze, M., Paletta, L. (eds) Computer Vision Systems. ICVS 2003. Lecture Notes in Computer Science, vol 2626. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-36592-3_6
Download citation
DOI: https://doi.org/10.1007/3-540-36592-3_6
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-00921-4
Online ISBN: 978-3-540-36592-1
eBook Packages: Springer Book Archive