Skip to main content
Log in

A generalized temporal context model for classifying image collections

  • Regular Paper
  • Published:
Multimedia Systems Aims and scope Submit manuscript

Abstract

Semantic scene classification is an open problem in computer vision, especially when information from only a single image is employed. In applications involving image collections, however, images are clustered sequentially, allowing surrounding images to be used as temporal context. We present a general probabilistic temporal context model in which the first-order Markov property is used to integrate content-based and temporal context cues. The model uses elapsed time-dependent transition probabilities between images to enforce the fact that images captured within a shorter period of time are more likely to be related. This model is generalized in that it allows arbitrary elapsed time between images, making it suitable for classifying image collections. In addition, we derived a variant of this model to use in ordered image collections for which no timestamp information is available, such as film scans. We applied the proposed context models to two problems, achieving significant gains in accuracy in both cases. The two algorithms used to implement inference within the context model, Viterbi and belief propagation, yielded similar results with a slight edge to belief propagation.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Assfalg, J., Bertini, M., Marco, C., Del Bimbo, A., Nunziati, W.: Semantic annotation of soccer videos: automatic highlights identification. Comput. Vision Image Understand. 92(2), 285–305 (2003)

    Google Scholar 

  2. Besag, J.: On the statistical analysis of dirty pictures. J. Royal Stat. Soc. B 48, 259–302 (1986)

    MathSciNet  MATH  Google Scholar 

  3. Boutell, M., Luo, J., Gray, R.T.: Sunset scene classification using simulated image recomposition. In: Proceedings of IEEE International Conference on Multimedia and Expo (2003)

  4. Chou, P.: The Theory and Practice of Bayesian Image Labeling. PhD thesis, University of Rochester, Rochester, NY (1988)

  5. Dimitrova, N., Agnihotri, L., Wei, G.: Video classification based on HMM using text and faces. European Conference on Signal Processing, Finland (2000)

  6. Duda, R., Hart, R., Stork, D.: Pattern Classification, 2nd Edn. Wiley, New York (2001)

    MATH  Google Scholar 

  7. Freeman, W.T., Pasztor, E.C., Carmichael, O.T.: Learning low-level vision. Int. J. Comput. Vision 40(1), 24–57 (2000)

    Article  Google Scholar 

  8. Geman, S., Geman, D.: Stochastic relaxation, Gibbs distributions, and the Bayesian restoration of images. IEEE Trans. Pattern Anal. Machine Intell. 6(6), 721–741 (1984)

    Article  MATH  Google Scholar 

  9. Glass, J.R., Hazen, T.J., Hetherington, I.L.: Real-Time Telephone-Based Speech Recognition In The Jupiter Domain. In: Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing, Phoenix AZ (1999)

  10. Huang, J., Liu, Z., Wang, Y., Chen, Y., Wong, E.K.: Integration of Multimodal Features for Video Classification Based on HMM, IEEE Workshop on Multimedia Signal Processing, Copenhagen, Denmark (1999)

  11. Jaimes, A., Benitez, A.B., Chang, S.-F., Loui, A.C.: Discovering Recurrent Visual Semantics in Consumer Photographs. In: Proceedings of IEEE International Conference on Image Processing, Vancouver, Canada (2000)

  12. Loui, A.C., Savakis, A.: Automatic event segmentation and quality screening for albuming applications. In: Proceedings of IEEE International Conference on Multimedia and Expo (2000)

  13. Manning, C., Schutze, H.: Foundations of Statistical Natural Language Processing. MIT Press, Cambridge, MA (1999)

    MATH  Google Scholar 

  14. Moore, D., Essa, I., Hayes, M. III.: Exploiting human actions and object context for recognition tasks. In: Proceedings of International Conference on Computer Vision (1999)

  15. Mulhem, P., Lim, J.-H.: Home photo retrieval: Time matters. Lect. Notes Comput. Sci. 2728, 321–330 (2003)

    Google Scholar 

  16. Naphade, M.R., Huang, T.S.: A probabilistic framework for semantic video indexing, filtering, and retrieval. IEEE Trans. Multimedia 3(1), 141–151 (2001)

    Google Scholar 

  17. Neal, R.M.: Probabilistic inference using Markov chain Monte Carlo methods. Technical Report CRG-TR-93-1, Dept. of Computer Science, University of Toronto (1993)

  18. Pearl, J.: Probabilistic Reasoning in Intelligent Systems. Morgan Kaufmann Publishers (1988)

  19. Platt, J.: AutoAlbum: Clustering digital photographs using probabilistic model merging. In: Proceedings of IEEE Workshop on Content-based Access of Image and Video Libraries (2000)

  20. Sebe, N., Lew, M., Zhou, X., Huang, T., Bakker, E.: The state of the art in image and video retrieval. In: Proceedings of International Conference on Image and Video Retrieval (2003)

  21. Serrano, N., Savakis, A., Luo, J.: A computationally efficient approach to indoor/outdoor scene classification. Pattern Recogn. 37(9), 1773–1784 (2004)

    Article  MATH  Google Scholar 

  22. Singhal, A., Luo, J., Zhu, W.: Probabilistic spatial context models for scene content understanding. In: Proceedings of International Conference on Computer Vision and Pattern Recognition (2003)

  23. Snoek, C., Worring, M.: Multimodal video indexing: A review of the state-of-the-art. Technical Report 2001–20, Intelligent Sensory Information Systems Group, University of Amsterdam (2001)

  24. Song, Y., Zhang, A.: Analyzing scenery images by monotonic tree. ACM Multimedia Syst. J. 8(6), 495–511 (2003)

    Google Scholar 

  25. Szummer, M., Picard, R.: Indoor-outdoor image classification. In: Proceedings of IEEE International Workshop on Content-based Access of Image and Video Databases (1998)

  26. Tax, D., Duin, R.: Using two-class classifiers for multi-class classification. In: Proceedings of International Conference on Pattern Recognition (2002)

  27. Torralba, A., Murphy, K., Freeman, W., Rubin, M.: Context-based vision system for place and object recognition. In: Proceedings of International Conference on Computer Vision (2003)

  28. Torralba, A., Sinha, P.: Statistical context priming for object detection. In: Proceedings of International Conference on Computer Vision (2001)

  29. Vailaya, A., Figueiredo, M., Jain, A., Zhang, H.-J.: Content-based hierarchical classification of vacation images. In: Proceedings of International Conference on Multimedia Computing and Systems (1999)

  30. Vasconcelos, N., Lippman, A.: Statistical models of video structure for content analysis and characterization. IEEE Trans. Image Process. 9(1), 3–19 (2000)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jiebo Luo.

Additional information

Matthew Boutell received the BS degree in Mathematical Science from Worcester Polytechnic Institute, Massachusetts, in 1993, the MEd degree from University of Massachusetts at Amherst in 1994, and the PhD degree in Computer Science from the University of Rochester, Rochester, NY, in 2005. He served for several years as a mathematics and computer science instructor at Norton High School and Stonehill College and as a research intern/consultant at Eastman Kodak Company. Currently, he is Assistant Professor of Computer Science and Software Engineering at Rose-Hulman Institute of Technology in Terre Haute, Indiana. His research interests include image understanding, machine learning, and probabilistic modeling.

Jiebo Luo received his PhD degree in Electrical Engineering from the University of Rochester, Rochester, NY in 1995. He is a Senior Principal Scientist with the Kodak Research Laboratories.

He was a member of the Organizing Committee of the 2002 IEEE International Conference on Image Processing and 2006 IEEE International Conference on Multimedia and Expo, a guest editor for the Journal of Wireless Communications and Mobile Computing Special Issue on Multimedia Over Mobile IP and the Pattern Recognition journal Special Issue on Image Understanding for Digital Photos, and a Member of the Kodak Research Scientific Council.

He is on the editorial boards of the IEEE Transactions on Multimedia, Pattern Recognition, and Journal of Electronic Imaging. His research interests include image processing, pattern recognition, computer vision, medical imaging, and multimedia communication. He has authored over 100 technical papers and holds over 30 granted US patents. He is a Kodak Distinguished Inventor and a Senior Member of the IEEE.

Chris Brown (BA Oberlin 1967, PhD University of Chicago 1972) is Professor of Computer Science at the University of Rochester.

He has published in many areas of computer vision and robotics. He wrote COMPUTER VISION with his colleague Dana Ballard, and influential work on the “active vision” paradigm was reported in two special issues of the International Journal of Computer Vision. He edited the first two volumes of ADVANCES IN COMPUTER VISION for Erlbaum and (with D. Terzopoulos) REAL-TIME COMPUTER VISION, from Cambridge University Press. He is the co-editor of VIDERE, the first entirely on-line refereed computer vision journal (MIT Press).

His most recent PhD students have done research in infrared tracking and face recognition, features and strategies for image understanding, augmented reality, and three-dimensional reconstruction algorithms.

He supervised the undergraduate team that twice won the AAAI Host Robot competition (and came third in the Robot Rescue competition in 2003).

Rights and permissions

Reprints and permissions

About this article

Cite this article

Boutell, M., Luo, J. & Brown, C. A generalized temporal context model for classifying image collections. Multimedia Systems 11, 82–92 (2005). https://doi.org/10.1007/s00530-005-0202-7

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00530-005-0202-7

Keywords

Navigation