Abstract
This paper analyzes different ways of coupling the information from multiple visual features in the representation of visual contents using temporal models based on Markov chains. We assume that the optimal combination is given by the Cartesian product of all feature state spaces. Simpler model structures are obtained by assuming independencies between random variables in the probabilistic structure. The relative entropy provides a measure of the information loss of a simplified structure with respect to a more complex one. The loss of information is then compared to the loss of accuracy in the representation of visual contents in video sequences, which is measured in terms of shot retrieval performance. We reach three main conclusions: (1) the full-coupled model structure is an accurate approximation to the Cartesian product structure, (2) the largest loss of information is found when direct temporal dependencies are removed, and (3) there is a direct relationship between loss of information and loss of representation accuracy.
Work supported by CICYT grant TEL99-1206-C02-02. Partial funding from Visual Century Research.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Manjunath, B. S., Salembier, P., Sikora, T., eds.: Introduction to MPEG 7: Multimedia Content Description Language. John Wiley & Sons (2002)
Naphade, M.R., Huang, T. S.: Extracting semantics from audiovisual content: the final frontier in multimedia retrieval. IEEE Transactions on Neural Networks 13 (2002) 793–810
Sánchez, J.M., Binefa, X., Kender, J. R.: Coupled Markov chains for video contents characterization. In: Proc. International Conference on Pattern Recognition, Quebec, Canada (2002)
Sánchez, J.M., Binefa, X., Kender, J. R.: Multiple feature temporal models for object detection in video. In: Proc. IEEE International Conference on Multimedia and Expo, Lausanne, Switzerland (2002)
Jensen, F.V.: An introduction to Bayesian Networks. UCL Press (1996)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2003 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Sánchez, J.M., Binefa, X., Kender, J.R. (2003). Multiple Features in Temporal Models for the Representation of Visual Contents in Video. In: Bakker, E.M., Lew, M.S., Huang, T.S., Sebe, N., Zhou, X.S. (eds) Image and Video Retrieval. CIVR 2003. Lecture Notes in Computer Science, vol 2728. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-45113-7_22
Download citation
DOI: https://doi.org/10.1007/3-540-45113-7_22
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-40634-1
Online ISBN: 978-3-540-45113-6
eBook Packages: Springer Book Archive