Skip to main content
Log in

Probabilistic modeling of scenes using object frames

  • Research Paper
  • Published:
Science China Information Sciences Aims and scope Submit manuscript

Abstract

In this paper, we propose a probabilistic scene model using object frames, each of which is a group of co-occurring objects with fixed spatial relations. In contrast to standard co-occurrence models, which mostly explore the pairwise co-existence of objects, the proposed model captures the spatial relationship among groups of objects. Such information is closely tied to the semantics of the underlying scenes, which allows us to perform object detection and scene recognition in a unified framework. The proposed probabilistic model has two major components. The first models the dependencies between object frames and objects by adopting the Latent Dirichlet Allocation model for text analysis. The second component characterizes the dependencies between object frames and scenes by establishing a mapping between global image features and object frame distributions. Experimental results show that the induced object frames are both semantically meaningful and spatially consistent. In addition, our model significantly improves the performance of object recognition and scene retrieval.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Similar content being viewed by others

References

  1. Desai C, Ramanan D, Fowlkes C C. Discriminative models for multi-class object layout. Int J Comput Vis, 2011, 95: 1–12

    Article  MATH  MathSciNet  Google Scholar 

  2. Divvala S K, Hoiem D, Hays J H, et al. An empirical study of context in object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Miami, 2009. 1271–1278

    Google Scholar 

  3. Galleguillos C, McFee B, Belongie S, et al. Multi-class object localization by combining local contextual interactions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, San Francisco, 2010. 113–120

    Google Scholar 

  4. Marszalek M, Schmid C. Semantic hierarchies for visual object recognition. In: Proceedings of the IEEE Computer Vision and Pattern Recognition, Minneapolis, 2007. 1–7

    Google Scholar 

  5. Rabinovich A, Vedaldi A, Galleguillos C, et al. Objects in context. In: Proceedings of the IEEE 11th International Conference on Computer Vision, Rio de Janeiro, 2007. 1–8

    Google Scholar 

  6. Sivic J, Russell B C, Zisserman A, et al. Unsupervised discovery of visual object class hierarchies. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Anchorage, 2008. 1–8

    Google Scholar 

  7. Blei D M, Jordan M I. Modeling annotated data. In: Proceedings of the 26th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. New York: ACM, 2003. 127–134

    Google Scholar 

  8. Oliva A, Torralba A. Modeling the shape of the scene: a holistic representation of the spatial envelope. Int J Comput Vis, 2001, 42: 145–175

    Article  MATH  Google Scholar 

  9. Mimno D, McCallum A. Topic models conditioned on arbitrary features with dirichlet-multinomial regression. In: Proceedings of the 24th Conference on Uncertainty in Artificial Intelligence, 2008. 411–418

    Google Scholar 

  10. Lowe D G. Object recognition from local scale-invariant features. In: Proceedings of the 7th IEEE International Conference on Computer Vision, Kerkyra, 1999. 1150–1157

    Chapter  Google Scholar 

  11. Perona P, Malik J. Scale-space and edge detection using anisotropic diffusion. IEEE Trans Pattern Anal Mach Intell, 1990, 12: 629–639

    Article  Google Scholar 

  12. Leung T, Malik J. Representing and recognizing the visual appearance of materials using three-dimensional textons. Int J Comput Vis, 2001, 43: 29–44

    Article  MATH  Google Scholar 

  13. Oliva A, Torralba A. Modeling the shape of the scene: a holistic representation of the spatial envelope. Int J Comput Vis, 2001, 42: 145–175

    Article  MATH  Google Scholar 

  14. Farhadi A, Endres I, Hoiem D, et al. Describing objects by their attributes. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Miami, 2009. 1778–1785

    Google Scholar 

  15. Lampert C H, Nickisch H, Harmeling S. Learning to detect unseen object classes by between-class attribute transfer. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Miami, 2009. 951–958

    Google Scholar 

  16. Ferrari V, Zisserman A. Learning visual attributes. In: Proceedings of the 21st Annual Conference on Neural Information Processing Systems, Vancouver, 2007. 433–440

    Google Scholar 

  17. Kumar N, Berg A C, Belhumeur P N, et al. Attribute and simile classifiers for face verification. In: Proceedings of the 12th IEEE International Conference on Computer Vision, Kyoto, 2009. 365–372

    Google Scholar 

  18. Torresani L, Szummer M, Fitzgibbon A. Efficient object category recognition using classemes. In: Proceedings of 11th European Conference on Computer Vision, Heraklion, 2010. 776–789

    Google Scholar 

  19. Xing E P, Li L -J, Su H, et al. Object bank: a high-level image representation for scene classification & semantic feature sparsification. In: Proceedings of the 24th Annual Conference on Neural Information Processing Systems, Vancouver, 2010. 1378–1386

    Google Scholar 

  20. Bosch A, Zisserman A, Munoz X. Scene classification via pLSA. In: Proceedings of 9th European Conference on Computer Vision, Graz, 2006. 517–530

    Google Scholar 

  21. Fei-Fei L, Perona P. A Bayesian hierarchy model for learning natural scene categories. In: Proceedings of the IEEE Computer Vision and Pattern Recognition. Piscataway: IEEE, 2005. 524–531

    Google Scholar 

  22. Sudderth E, Torralba A, Freeman W T, et al. Learning hierarchical models of scenes, objects, and parts. In: Proceedings of the IEEE International Conference on Computer Vision, Beijing, 2005. 1331–1338

    Google Scholar 

  23. Li L -J, Socher R, Fei-Fei L. Towards total scene understanding: classification, annotation and segmentation in an automatic framework. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Miami, 2009. 2036–2043

    Google Scholar 

  24. Zhu J, Li L -J, Fei-Fei L, et al. Large margin learning of upstream scene understanding models. In: Proceedings of 24th Annual Conference on Neural Information Processing Systems, Vancouver, 2010. 2586–2594

    Google Scholar 

  25. Choi M J, Lim J J, Torralba A, et al. Exploiting hierarchical context on a large database of object categories. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, San Francisco, 2010. 129–136

    Google Scholar 

  26. Carbonetto P, Freitas N, de Barnard K. A statistical model for general contextual object recognition. In: Proceedings of 8th European Conference on Computer Vision, Prague, 2004. 350–362

    Google Scholar 

  27. Boben M, Fidler S, Leonardis A. Evaluating multiclass learning strategies in a hierarchical framework for object detection. In: Proceedings of the 23rd Annual Conference on Neural Information Processing Systems, Vancouver, 2009. 531–539

    Google Scholar 

  28. Sadeghi M A, Farhadi A. Recognition using visual phrases. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Providence, 2011. 1745–1752

    Google Scholar 

  29. Li C, Parikh D, Chen T. Automatic discovery of groups of objects for scene understanding. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Providence, 2012. 2735–2742

    Google Scholar 

  30. Choi W, Chao Y W, Pantofaru C, et al. Understanding indoor scenes using 3D geometric phrases. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Portland, 2013. 33–40

    Google Scholar 

  31. Zhao Y B, Zhu S -C. Scene parsing by integrating function, geometry and appearance models. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Portland, 2013. 3119–3126

    Google Scholar 

  32. Yao B. I2T: image parsing to text description. Proc IEEE, 2010, 98: 1485–1508

    Article  Google Scholar 

  33. Lin L, Wu T, Porway J, et al. A stochastic graph grammar for compositional object representation and recognition. Pattern Recogn, 2009, 42: 1297–1307

    Article  MATH  Google Scholar 

  34. Liu X. Integrating spatio-temporal context with multiview representation for object recognition in visual surveillance. IEEE Trans Circuit Syst Video Techn, 2011, 21: 393–407

    Article  Google Scholar 

  35. Ding C, Li T, Peng W. On the equivalence between non-negative matrix factorization and probabilistic latent semantic indexing. Comput Stat Data Anal, 2008, 52: 3913–3927

    Article  MATH  MathSciNet  Google Scholar 

  36. Low Y, Agarwal D, Smola A J. Multiple domain user personalization. In: Proceedings of the 17th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. New York: ACM, 2011. 123–131

    Google Scholar 

  37. Mei Q, Zhai C X. Discovering evolutionary theme patterns from text: an exploration of temporal text mining. In: Proceedings of the 11th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. New York: ACM, 2005. 198–207

    Google Scholar 

  38. Lazebnik S, Schmid C, Ponce J. Beyond bags of features: spatial pyramid matching for recognizing natural scene categories. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2006. 2169–2178

    Google Scholar 

  39. Hoiem D, Efros A A, Hebert M. Putting objects in perspective. Int J Comput Vis, 2008, 80: 3–15

    Article  Google Scholar 

  40. Minka T, Winn J, Guiver J, et al. Infer.net, version 2.1.30904, 2008

    Google Scholar 

  41. Winn J, Bishop C M. Variational message passing. J Mach Learning Res, 2005, 6: 661–694

    MATH  MathSciNet  Google Scholar 

  42. Felzenszwalb P F, Girshick R B, McAllester D, et al. Object detection with discriminatively trained part-based models. IEEE Trans Pattern Anal Mach Intell, 2009, 32: 1627–1645

    Article  Google Scholar 

  43. Choi M J, Lim J J, Torralba A, et al. Exploiting hierarchical context on a large database of object categories. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, San Francisco, 2010. 129–136

    Google Scholar 

  44. Quattoni A, Torralba A. Recognizing indoor scenes. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Miami, 2009. 413–420

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Hao Su.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Su, H., Yu, A.W. Probabilistic modeling of scenes using object frames. Sci. China Inf. Sci. 58, 1–13 (2015). https://doi.org/10.1007/s11432-014-5151-3

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11432-014-5151-3

Keywords

Navigation