A generalized temporal context model for classifying image collections

Boutell, Matthew; Luo, Jiebo; Brown, Christopher

doi:10.1007/s00530-005-0202-7

A generalized temporal context model for classifying image collections

Regular Paper
Published: 16 November 2005

Volume 11, pages 82–92, (2005)
Cite this article

Multimedia Systems Aims and scope Submit manuscript

Matthew Boutell^1,2,
Jiebo Luo² &
Christopher Brown¹

83 Accesses
10 Citations
Explore all metrics

Abstract

Semantic scene classification is an open problem in computer vision, especially when information from only a single image is employed. In applications involving image collections, however, images are clustered sequentially, allowing surrounding images to be used as temporal context. We present a general probabilistic temporal context model in which the first-order Markov property is used to integrate content-based and temporal context cues. The model uses elapsed time-dependent transition probabilities between images to enforce the fact that images captured within a shorter period of time are more likely to be related. This model is generalized in that it allows arbitrary elapsed time between images, making it suitable for classifying image collections. In addition, we derived a variant of this model to use in ordered image collections for which no timestamp information is available, such as film scans. We applied the proposed context models to two problems, achieving significant gains in accuracy in both cases. The two algorithms used to implement inference within the context model, Viterbi and belief propagation, yielded similar results with a slight edge to belief propagation.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Visual and semantic context modeling for scene-centric image annotation

Article 06 April 2016

Image Annotation Using a Semantic Hierarchy

Photo Recall: Using the Internet to Label Your Photos

References

Assfalg, J., Bertini, M., Marco, C., Del Bimbo, A., Nunziati, W.: Semantic annotation of soccer videos: automatic highlights identification. Comput. Vision Image Understand. 92(2), 285–305 (2003)
Google Scholar
Besag, J.: On the statistical analysis of dirty pictures. J. Royal Stat. Soc. B 48, 259–302 (1986)
MathSciNet MATH Google Scholar
Boutell, M., Luo, J., Gray, R.T.: Sunset scene classification using simulated image recomposition. In: Proceedings of IEEE International Conference on Multimedia and Expo (2003)
Chou, P.: The Theory and Practice of Bayesian Image Labeling. PhD thesis, University of Rochester, Rochester, NY (1988)
Dimitrova, N., Agnihotri, L., Wei, G.: Video classification based on HMM using text and faces. European Conference on Signal Processing, Finland (2000)
Duda, R., Hart, R., Stork, D.: Pattern Classification, 2nd Edn. Wiley, New York (2001)
MATH Google Scholar
Freeman, W.T., Pasztor, E.C., Carmichael, O.T.: Learning low-level vision. Int. J. Comput. Vision 40(1), 24–57 (2000)
Article Google Scholar
Geman, S., Geman, D.: Stochastic relaxation, Gibbs distributions, and the Bayesian restoration of images. IEEE Trans. Pattern Anal. Machine Intell. 6(6), 721–741 (1984)
Article MATH Google Scholar
Glass, J.R., Hazen, T.J., Hetherington, I.L.: Real-Time Telephone-Based Speech Recognition In The Jupiter Domain. In: Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing, Phoenix AZ (1999)
Huang, J., Liu, Z., Wang, Y., Chen, Y., Wong, E.K.: Integration of Multimodal Features for Video Classification Based on HMM, IEEE Workshop on Multimedia Signal Processing, Copenhagen, Denmark (1999)
Jaimes, A., Benitez, A.B., Chang, S.-F., Loui, A.C.: Discovering Recurrent Visual Semantics in Consumer Photographs. In: Proceedings of IEEE International Conference on Image Processing, Vancouver, Canada (2000)
Loui, A.C., Savakis, A.: Automatic event segmentation and quality screening for albuming applications. In: Proceedings of IEEE International Conference on Multimedia and Expo (2000)
Manning, C., Schutze, H.: Foundations of Statistical Natural Language Processing. MIT Press, Cambridge, MA (1999)
MATH Google Scholar
Moore, D., Essa, I., Hayes, M. III.: Exploiting human actions and object context for recognition tasks. In: Proceedings of International Conference on Computer Vision (1999)
Mulhem, P., Lim, J.-H.: Home photo retrieval: Time matters. Lect. Notes Comput. Sci. 2728, 321–330 (2003)
Google Scholar
Naphade, M.R., Huang, T.S.: A probabilistic framework for semantic video indexing, filtering, and retrieval. IEEE Trans. Multimedia 3(1), 141–151 (2001)
Google Scholar
Neal, R.M.: Probabilistic inference using Markov chain Monte Carlo methods. Technical Report CRG-TR-93-1, Dept. of Computer Science, University of Toronto (1993)
Pearl, J.: Probabilistic Reasoning in Intelligent Systems. Morgan Kaufmann Publishers (1988)
Platt, J.: AutoAlbum: Clustering digital photographs using probabilistic model merging. In: Proceedings of IEEE Workshop on Content-based Access of Image and Video Libraries (2000)
Sebe, N., Lew, M., Zhou, X., Huang, T., Bakker, E.: The state of the art in image and video retrieval. In: Proceedings of International Conference on Image and Video Retrieval (2003)
Serrano, N., Savakis, A., Luo, J.: A computationally efficient approach to indoor/outdoor scene classification. Pattern Recogn. 37(9), 1773–1784 (2004)
Article MATH Google Scholar
Singhal, A., Luo, J., Zhu, W.: Probabilistic spatial context models for scene content understanding. In: Proceedings of International Conference on Computer Vision and Pattern Recognition (2003)
Snoek, C., Worring, M.: Multimodal video indexing: A review of the state-of-the-art. Technical Report 2001–20, Intelligent Sensory Information Systems Group, University of Amsterdam (2001)
Song, Y., Zhang, A.: Analyzing scenery images by monotonic tree. ACM Multimedia Syst. J. 8(6), 495–511 (2003)
Google Scholar
Szummer, M., Picard, R.: Indoor-outdoor image classification. In: Proceedings of IEEE International Workshop on Content-based Access of Image and Video Databases (1998)
Tax, D., Duin, R.: Using two-class classifiers for multi-class classification. In: Proceedings of International Conference on Pattern Recognition (2002)
Torralba, A., Murphy, K., Freeman, W., Rubin, M.: Context-based vision system for place and object recognition. In: Proceedings of International Conference on Computer Vision (2003)
Torralba, A., Sinha, P.: Statistical context priming for object detection. In: Proceedings of International Conference on Computer Vision (2001)
Vailaya, A., Figueiredo, M., Jain, A., Zhang, H.-J.: Content-based hierarchical classification of vacation images. In: Proceedings of International Conference on Multimedia Computing and Systems (1999)
Vasconcelos, N., Lippman, A.: Statistical models of video structure for content analysis and characterization. IEEE Trans. Image Process. 9(1), 3–19 (2000)
Article Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science, University of Rochester, Rochester, New York, USA
Matthew Boutell & Christopher Brown
Research and Development Laboratories, Eastman Kodak Company, New York, USA
Matthew Boutell & Jiebo Luo

Authors

Matthew Boutell
View author publications
You can also search for this author in PubMed Google Scholar
Jiebo Luo
View author publications
You can also search for this author in PubMed Google Scholar
Christopher Brown
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Jiebo Luo.

Additional information

Matthew Boutell received the BS degree in Mathematical Science from Worcester Polytechnic Institute, Massachusetts, in 1993, the MEd degree from University of Massachusetts at Amherst in 1994, and the PhD degree in Computer Science from the University of Rochester, Rochester, NY, in 2005. He served for several years as a mathematics and computer science instructor at Norton High School and Stonehill College and as a research intern/consultant at Eastman Kodak Company. Currently, he is Assistant Professor of Computer Science and Software Engineering at Rose-Hulman Institute of Technology in Terre Haute, Indiana. His research interests include image understanding, machine learning, and probabilistic modeling.

Jiebo Luo received his PhD degree in Electrical Engineering from the University of Rochester, Rochester, NY in 1995. He is a Senior Principal Scientist with the Kodak Research Laboratories.

He was a member of the Organizing Committee of the 2002 IEEE International Conference on Image Processing and 2006 IEEE International Conference on Multimedia and Expo, a guest editor for the Journal of Wireless Communications and Mobile Computing Special Issue on Multimedia Over Mobile IP and the Pattern Recognition journal Special Issue on Image Understanding for Digital Photos, and a Member of the Kodak Research Scientific Council.

He is on the editorial boards of the IEEE Transactions on Multimedia, Pattern Recognition, and Journal of Electronic Imaging. His research interests include image processing, pattern recognition, computer vision, medical imaging, and multimedia communication. He has authored over 100 technical papers and holds over 30 granted US patents. He is a Kodak Distinguished Inventor and a Senior Member of the IEEE.

Chris Brown (BA Oberlin 1967, PhD University of Chicago 1972) is Professor of Computer Science at the University of Rochester.

He has published in many areas of computer vision and robotics. He wrote COMPUTER VISION with his colleague Dana Ballard, and influential work on the “active vision” paradigm was reported in two special issues of the International Journal of Computer Vision. He edited the first two volumes of ADVANCES IN COMPUTER VISION for Erlbaum and (with D. Terzopoulos) REAL-TIME COMPUTER VISION, from Cambridge University Press. He is the co-editor of VIDERE, the first entirely on-line refereed computer vision journal (MIT Press).

His most recent PhD students have done research in infrared tracking and face recognition, features and strategies for image understanding, augmented reality, and three-dimensional reconstruction algorithms.

He supervised the undergraduate team that twice won the AAAI Host Robot competition (and came third in the Robot Rescue competition in 2003).

Rights and permissions

Reprints and permissions

About this article

Cite this article

Boutell, M., Luo, J. & Brown, C. A generalized temporal context model for classifying image collections. Multimedia Systems 11, 82–92 (2005). https://doi.org/10.1007/s00530-005-0202-7

Download citation

Published: 16 November 2005
Issue Date: November 2005
DOI: https://doi.org/10.1007/s00530-005-0202-7

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A generalized temporal context model for classifying image collections

Abstract

Access this article

Similar content being viewed by others

Visual and semantic context modeling for scene-centric image annotation

Image Annotation Using a Semantic Hierarchy

Photo Recall: Using the Internet to Label Your Photos

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Keywords

Navigation

A generalized temporal context model for classifying image collections

Abstract

Access this article

Similar content being viewed by others

Visual and semantic context modeling for scene-centric image annotation

Image Annotation Using a Semantic Hierarchy

Photo Recall: Using the Internet to Label Your Photos

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation