Skip to main content
Log in

Context modeling in computer vision: techniques, implications, and applications

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

In recent years there has been a surge of interest in context modeling for numerous applications in computer vision. The basic motivation behind these diverse efforts is generally the same—attempting to enhance current image analysis technologies by incorporating information from outside the target object, including scene analysis as well as metadata. However, many different approaches and applications have been proposed, leading to a somewhat inchoate literature that can be difficult to navigate. The current paper provides a ‘roadmap’ of this new research, including a discussion of the basic motivation behind context-modeling, an overview of the most representative techniques, and a discussion of specific applications in which contextual modeling has been incorporated. This review is intended to introduce researchers in computer vision and image analysis to this increasingly important field as well as provide a reference for those who may wish to incorporate context modeling in their own work.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4

Similar content being viewed by others

Notes

  1. http://www.slideshare.net/jain49/contenxt-100407

  2. http://web.mit.edu/torralba/www/

  3. http://cvcl.mit.edu/Aude.htm

  4. http://www.cs.cmu.edu/∼efros/

  5. http://www.cs.uiuc.edu/homes/dhoiem/

  6. http://userweb.cs.utexas.edu/∼grauman/

  7. http://vicos.fri.uni-lj.si/alesl/

References

  1. Alvarez G, Oliva A (2008) The representation of simple ensemble visual features outside the focus of attention. Psychol Sci 19(4):392–398

    Article  Google Scholar 

  2. Amores J, Radeva P (2005) Registration and retrieval of highly elastic bodies using contextual information. Pattern Recogn Lett 26(11):1720–1731

    Article  Google Scholar 

  3. Amores J, Sebe N, Radeva P (2007) Context-based object-class recognition and retrieval by generalized correlograms. IEEE Trans Pattern Anal Mach Intell 29(10):1818–1833

    Article  Google Scholar 

  4. Ariely D (2001) Seeing sets: representation by statistical properties. Psychol Sci 12(2):157–162

    Article  Google Scholar 

  5. Auckland M (2007) Non-target objects can influence perceptual processes during object recognition. Psychon Bull Rev 14:332–337

    Google Scholar 

  6. Bar M (2004) Visual objects in context. Nat Rev Neurosci 5(8):617–629

    Article  MathSciNet  Google Scholar 

  7. Bar M, Aminoff E (2003) Cortical analysis of visual context. Neuron 38(2):347–358

    Article  Google Scholar 

  8. Bar M, Ullman S (1996) Spatial context in recognition. Perception 25(3):343–352

    Article  Google Scholar 

  9. Barenholtz E (2009) Quantifying the role of context in visual object recognition [abstract]. J Vis 9(8):800, 800a

    Google Scholar 

  10. Belongie S, Malik J, Puzicha J (2002) Shape matching and object recognition using shape contexts. IEEE Trans Pattern Anal Mach Intell 24:509–522

    Article  Google Scholar 

  11. Biederman I, Mezzanote R, Rabinovitz J (1982) Scene perception: detecting and judging objects undergoing relational violations. Cogn Psychol 14:143–147

    Article  Google Scholar 

  12. Biederman I, Rabinowitz JC, Glass AL, Stacy EW (1974) On the information extracted from a glance at a scene. J Exp Psychol 103(3):597–600

    Article  Google Scholar 

  13. Boyce JS, Pollatsek A, Rayner K (1998) Effect of background information on object identification. J Exp Psychol Hum Percept Perform 15(3):556–566

    Google Scholar 

  14. Brockmole JR, Castelhano MS, Henderson JM (2006) Contextual cueing in naturalistic scenes: global and local contexts. J Exp Psychol Learn Mem Cogn 32(4):699–706

    Article  Google Scholar 

  15. Brockmole JR, Hambrick DZ, Windisch DJ, Henderson JM (2008) The role of meaning in contextual cueing: evidence from chess expertise. Q J Exp Psychol (Colchester) 61(12):1886–1896

    Article  Google Scholar 

  16. Cao L, Luo J, Kautz H, Huang T (2009) Image annotation within the context of personal photo collections using hierarchical event and scene models. IEEE Trans Multimedia 11(2):208–219

    Article  Google Scholar 

  17. Carbonetto P, de Freitas N, Barnard K (2004) A statistical model for general contextual object recognition. In: European conference on computer vision (ECCV), pp 350–362

  18. Choi MJ, Lim J, Torralba A, Willsky A (2010) Exploiting hierarchical context on a large database of object categories. In: Computer vision and pattern recognition (CVPR), 2010 IEEE conference on, pp 129–136

  19. Chong SC, Treisman A (2005) Attentional spread in the statistical processing of visual displays. Percept Psychophys 67(1):1–13

    Google Scholar 

  20. Chong SC, Treisman A (2005) Statistical processing: computing the average size in perceptual groups. Vis Res 45(7):891–900

    Article  Google Scholar 

  21. Chua T-S, Tang J, Hong R, Li H, Luo Z, Zheng Y-T (2009) Nus-wide: a real-world web image database from National University of Singapore. In: Proc. of ACM conf. on image and video retrieval (CIVR’09). Santorini, Greece

  22. Chun M, Jiang Y (1999) Top-down attentional guidance based on implicit learning of visual covariation. Psychol Sci 10:360–365

    Article  Google Scholar 

  23. Chun M, Jiang Y (2003) Implicit, long-term spatial contextual memory. Percept Psychophys 65:72–80

    Google Scholar 

  24. Chun MM, Jiang Y (1998) Contextual cueing: implicit learning and memory of visual context guides spatial attention. Cogn Psychol 36(1):28–71

    Article  Google Scholar 

  25. Cox D, Meyers E, Sinha P (2004) Contextually evoked object-specific responses in human visual cortex. Science 304:115–117

    Article  Google Scholar 

  26. Davenport JL, Potter MC (2004) Scene consistency in object and background perception. Psychol Sci 15(8):559–564

    Article  Google Scholar 

  27. Divvala S, Hoiem D, Hays J, Efros A, Hebert M (2009) An empirical study of context in object detection. In: Computer vision and pattern recognition recognition, CVPR 2009. IEEE conference on, pp 1271–1278

  28. Endo N, Takeda Y (2005) Use of spatial context is restricted by relative position in implicit learning. Psychon Bull Rev 12(5):880–885

    Google Scholar 

  29. Epstein R, Harris A, Stanley D, Kanwisher N (1999) The parahippocampal place area: recognition, navigation, or encoding? Neuron 23(1):115–125

    Article  Google Scholar 

  30. Epstein R, Kanwisher N (1998) A cortical representation of the local visual environment. Nature 392:598–601

    Article  Google Scholar 

  31. Everingham M, Van Gool L, Williams CKI, Winn J, Zisserman A (2010) The Pascal Visual Object Classes (VOC) challenge. Int J Comput Vis 88(2):303–338

    Article  Google Scholar 

  32. Fei-Fei L, Iyer A, Koch C, Perona P (2007) What do we perceive in a glance of a real-world scene? J Vis 7(1):10

    Article  Google Scholar 

  33. Felzenszwalb P, Girshick R, McAllester D (2010) Cascade object detection with deformable part models. In: Computer vision and pattern recognition (CVPR), 2010 IEEE conference on, pp 2241–2248

  34. Felzenszwalb PF, Huttenlocher DP (2004) Efficient graph-based image segmentation. Int J Comput Vis 59:2004

    Article  Google Scholar 

  35. Ferrari V, Jurie F, Schmid C (2010) From images to shape models for object detection. Int J Comput Vis 87(3):284–303

    Article  Google Scholar 

  36. Fink M, Perona P (2003) Mutual boosting for contextual inference. In: Thrun S, Saul L, Schökopf B (eds) Advances in neural information processing systems (NIPS). MIT Press, Cambridge, MA

    Google Scholar 

  37. Fischler MA, Strat TM (1989) Recognizing objects in a natural environment: a contextual vision system (cvs). In: Proceedings of a workshop on image understanding workshop. San Francisco, CA, USA. Morgan Kaufmann, pp 774–796

  38. Forsyth D, Malik J, Fleck M, Greenspan H, Leung T, Belongie S, Carson C, Bregler C (1996) Finding pictures of objects in large collections of images. Technical report, UC Berkeley, Berkeley, CA, USA

  39. Galleguillos C, Belongie S (2010) Context based object categorization: a critical survey. Comput Vis Image Underst (CVIU) 114:712–722

    Article  Google Scholar 

  40. Galleguillos C, McFee B, Belongie S, Lanckriet GRG (2010) Multi-class object localization by combining local contextual interactions. In: IEEE conference in computer vision and patter recognition (CVPR)

  41. Galleguillos C, Rabinovich A, Belongie S (2008) Object categorization using co-occurrence, location and appearance. In: Proc. IEEE conf. computer vision and pattern recognition (CVPR), pp 1–8

  42. Goujon A, Didierjean A, Marmèche E (2007) Contextual cueing based on specific and categorical properties of the environment. Vis Cogn 15:257–275

    Article  Google Scholar 

  43. Goujon A, Didierjean A, Marmèche E (2009) Semantic contextual cuing and visual attention. J Exp Psychol Hum Percept Perform 35(1):50–71

    Article  Google Scholar 

  44. Graef PD, Troy AD, D’Ydewalle G (1992) Local and global contextual constraints on the identification of objects in scenes. Can J Psychol 46(3):489–508

    Google Scholar 

  45. Greene M, Oliva A (2009) Recognition of natural scenes from global properties: seeing the forest without representing the trees. Cogn Psychol 58:137–176

    Article  Google Scholar 

  46. Gronau N, Neta M, Bar M (2008) Integrated contextual representation for objects’ identities and their locations. J Cogn Neurosci 20(3):371–388

    Article  Google Scholar 

  47. Gupta A, Efros AA, Hebert M (2010) Blocks world revisited: image understanding using qualitative geometry and mechanics, in ECCV

  48. Hays J, Efros A (2008) IM2GPS: estimating geographic information from a single image. In: Computer vision and pattern recognition, CVPR 2008. IEEE conference on, pp 1–8

  49. Hedau V, Hoiem D, Forsyth D (2009) Recovering the spatial layout of cluttered rooms. In: Computer vision, 2009 IEEE 12th international conference on, pp 1849–1856

  50. Hedau V, Hoiem D, Forsyth D (2010) Thinking inside the box: using appearance models and context based on room geometry. In: Daniilidis K, Maragos P, Paragios N (eds) Computer vision – ECCV 2010. Springer Berlin, Heidelberg, pp 224–237

    Chapter  Google Scholar 

  51. Heitz G, Koller D (2008) Learning spatial context: using stuff to find things. In: ECCV ’08: Proceedings of the 10th European conference on computer vision. Springer, Berlin, pp 30–43

    Google Scholar 

  52. Henderson JM, Hollingworth A (1999) High-level scene perception. Annu Rev Psychol 50:243–271

    Article  Google Scholar 

  53. Hidalgo-Sotelo B, Oliva A, Torralba A (2005) Human learning of contextual priors for object search: where does the time go? In: Proceedings of the IEEE Computer Society conference on computer vision and pattern recognition, vol 3, pp 86–93

  54. Hock H (1974) Contextual relations: the influence of familiarity, physical plausibility, and belongingness. Percept Psychophys 16:4–8

    Google Scholar 

  55. Hoiem D, Efros A, Hebert M (2008) Closing the loop in scene interpretation. In: Computer vision and pattern recognition, CVPR 2008. IEEE conference on, pp 1–8

  56. Hoiem D, Efros AA, Hebert M (2005) Automatic photo pop-up. ACM Trans Graph (SIGGRAPH 2005) 24(3):577–584

    Article  Google Scholar 

  57. Hoiem D, Efros AA, Hebert M (2005) Geometric context from a single image. In: ICCV ’05: Proceedings of the tenth IEEE international conference on computer vision (ICCV’05), vol 1. IEEE Computer Society, Washington, pp 654–661

    Chapter  Google Scholar 

  58. Hoiem D, Efros AA, Hebert M (2007) Recovering surface layout from an image. Int J Comput Vis 75(1):151–172

    Article  Google Scholar 

  59. Hoiem D, Efros AA, Hebert M (2008) Putting objects in perspective. Int J Comput Vis 80(1):3–15

    Article  Google Scholar 

  60. Hoiem D, Stein A, Efros A, Hebert M (2007) Recovering occlusion boundaries from a single image. In: Computer vision, ICCV 2007. IEEE 11th international conference on, pp 1–8

  61. Hollingworth A, Henderson JM (1998) Does consistent scene context facilitate object perception? J Exp Psychol Gen 127(4):398–415

    Article  Google Scholar 

  62. Huang J, Kumar S, Mitra M, Zhu W, Zabih R (1997) Image indexing using color correlograms. In: Proc. CVPR, pp 762–768

  63. Hwang SJ, Grauman K (2010) Reading between the lines: object localization using implicit cues from image tags. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR). San Francisco, CA

  64. Itti L, Koch C (2001) Computational modelling of visual attention. Nat Rev Neurosci 2(3):194–203

    Article  Google Scholar 

  65. Joshi D, Luo J (2008) Inferring generic activities and events from image content and bags of geo-tags. In: CIVR ’08: Proceedings of the 2008 international conference on content-based image and video retrieval. ACM, New York, pp 37–46

    Chapter  Google Scholar 

  66. Kennedy LS, Naaman M (2008) Generating diverse and representative image search results for landmarks. In: WWW ’08: Proceeding of the 17th international conference on World Wide Web. ACM, New York, pp 297–306

  67. Koch C, Ullman S (1985) Shifts in selective visual attention: towards the underlying neural circuitry. Hum Neurobiol 4(4):219–227

    Google Scholar 

  68. Kumar M, Torr P, Zisserman A (2005) OBJ CUT. In: Computer vision and pattern recognition, CVPR 2005. IEEE computer society conference on, vol 1, pp 18–25

  69. Kumar S, Hebert M (2005) A hierarchical field framework for unified context-based classification. In: ICCV ’05: Proceedings of the tenth IEEE international conference on computer vision. IEEE Computer Society, Washington, pp 1284–1291

    Google Scholar 

  70. Kunar M (2007) Does contextual cueing guide the deployment of attention? J Exp Psychol Hum Percept Perform 33:816–828

    Article  Google Scholar 

  71. Lee YJ, Grauman K (2010) Object-graphs for context-aware category discovery. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR). San Francisco, CA

  72. Leordeanu M, Hebert M, Sukthankar R (2007) Beyond local appearance: category recognition from pairwise interactions of simple features. In: Computer vision and pattern recognition, CVPR ’07. IEEE conference on, pp 1–8

  73. Luo J, Yu J, Joshi D, Hao W (2008) Event recognition: viewing the world with a third eye. In: MM ’08: Proceeding of the 16th ACM international conference on multimedia. ACM, New York, pp 1071–1080

    Chapter  Google Scholar 

  74. Marr D (1982) Vision. W. H. Freeman, San Francisco

  75. Modestino J, Zhang J (1992) A Markov random field model-based approach to image interpretation. IEEE Transactions on Pattern Anal Mach Intell 14:606–615

    Article  Google Scholar 

  76. Navon D (1977) Forest before trees: the precedence of global features in visual perception. Cogn Psychol 9:353–383

    Article  Google Scholar 

  77. O’Hare N, Lee H, Cooray S, Gurrin C, Jones G, Malobabic J, O’Connor N, Smeaton AF, Uscilowski B (2006) Mediassist: using content-based analysis and context to manage personal photo collections. In: 5th int. conf. on image and video retrieval. Tempe, AZ, pp 529–532

  78. O’Hare N, Smeaton A (2009) Context-aware person identification in personal photo collections. IEEE Trans Multimedia 11(2):220–228

    Article  Google Scholar 

  79. Oliva A, Torralba A (2001) Modeling the shape of the scene: a holistic representation of the spatial envelope. Int J Comput Vis 42(3):145–175

    Article  MATH  Google Scholar 

  80. Oliva A, Torralba A (2006) Building the gist of a scene: the role of global image features in recognition. Prog Brain Res 155:23–36

    Article  Google Scholar 

  81. Oliva A, Torralba A (2007) The role of context in object recognition. Trends Cogn Sci 11(12):520–527

    Article  Google Scholar 

  82. Opelt A, Pinz A, Zisserman A (2006) A boundary-fragment-model for object detection. In: Leonardis A, Bischof H, Pinz A (eds) Computer vision – ECCV 2006. Springer Berlin, Heidelberg, pp 575–588

    Chapter  Google Scholar 

  83. Opelt A, Pinz A, Zisserman A (2006) Incremental learning of object detectors using a visual shape alphabet. In: Computer vision and pattern recognition, 2006 IEEE computer society conference on, pp 3–10

  84. Opelt A, Pinz A, Zisserman A (2008) Learning an alphabet of shape and appearance for multi-class object detection. Int J Comput Vis 80(1):16–44

    Article  Google Scholar 

  85. Palmer S (1975) The effects of contextual scenes on the identification of objects. Mem Cogn 3:519–526

    Google Scholar 

  86. Peissig J, Tarr M (2007) Visual object recognition: do we know more now than we did 20 years ago? Annu Rev Psychol 50:75–96

    Article  Google Scholar 

  87. Perko R, Leornardis A (2010) A framework for visual-context-aware object detection in still images. Comput Vis Image Underst (CVIU) 114:700–711

    Article  Google Scholar 

  88. Posner MI (1980) Orienting of attention. Q J Exp Psychol 32(1):3–25

    Article  MathSciNet  Google Scholar 

  89. Potter MC (1976) Short-term conceptual memory for pictures. J Exp Psychol Hum Learn 2(5):509–522

    Article  Google Scholar 

  90. Potter MC, Faulconer BA (1975) Time to understand pictures and words. Nature 253(5491):437–438

    Article  Google Scholar 

  91. Quattoni A, Torralba A (2009) Recognizing indoor scenes. In: Computer vision and pattern recognition, CVPR 2009. IEEE conference on, pp 413–420

  92. Rabinovich A, Belongie S (2009) Scenes vs. objects: a comparative study of two approaches to context based recognition. In: Computer vision and pattern recognition workshops. CVPR workshops 2009. IEEE computer society conference on, pp 92–99

  93. Rabinovich A, Vedaldi A, Galleguillos C, Wiewiora E, Belongie S (2007) Objects in context. In: Computer vision. ICCV 2007. IEEE 11th international conference on, pp 1–8

  94. Rieger JW, Köchy N, Schalk F, Grüschow M, Heinze H-J (2008) Speed limits: orientation and semantic context interactions constrain natural scene discrimination dynamics. J Exp Psychol Hum Percept Perform 34(1):56–76

    Article  Google Scholar 

  95. Russell B, Torralba A, Liu C, Fergus R, Freeman W (2007) Object recognition by scene alignment. In: Platt JC, Koller D, Singer Y, Roweis S (eds) Advances in neural information processing systems (NIPS). MIT Press, Cambridge, MA, pp 1241–1248

    Google Scholar 

  96. Russell BC, Torralba A, Murphy KP, Freeman WT (2008) Labelme: a database and web-based tool for image annotation. Int J Comput Vis 77:157–173

    Article  Google Scholar 

  97. Saxena A, Sun M, Ng AY (2009) Make3d: learning 3d scene structure from a single still image. IEEE Trans Pattern Anal Mach Intell 31(5):824–840

    Article  Google Scholar 

  98. Schyns P, Oliva A (1994) From blobs to boundary edges: evidence for time- and spatial-scale-dependent scene recognition. Psychol Sci 5:195–200

    Article  Google Scholar 

  99. Selfridge OG (1955) Pattern recognition and modern computers. In: Proceedings of the western joint computer conference. IEEE, New York

    Google Scholar 

  100. Siagian C, Itti L (2007) Rapid biologically-inspired scene classification using features shared with visual attention. IEEE Trans Pattern Anal Mach Intell 29(2):300–312

    Article  Google Scholar 

  101. Singhal A, Luo J, Zhu W (2003) Probabilistic spatial context models for scene content understanding. In: Computer vision and pattern recognition. Proceedings, 2003 IEEE computer society conference on, vol 1, pp I-235–I-241

  102. Smeulders A, Worring M, Santini S, Gupta A, Jain R (2000) Content-based image retrieval at the end of the early years. IEEE Trans Pattern Anal Mach Intell 22(12):1349–1380

    Article  Google Scholar 

  103. Strat T (1992) Natural object recognition. Springer-Verlag New York, Inc., New York, NY, USA

  104. Strat T (1993) Employing contextual information in computer vision. In: Proceedings of ARPA image understanding workshop, pp 217–229

  105. Strat T, FischlerM(1989) Context-based vision: recognition of natural scenes. In: Twenty-third asilomar conference on signals, systems and computers, pp 532–536

  106. Strat T, Fischler M (1990) A context-based recognition system for natural scenes and complex domains. In: DARPA image understanding workshop, pp 456–472

  107. Strat T, Fischler M (1991) Context-based vision: recognizing objects using information from both 2-d and 3-d imagery. IEEE Trans Pattern Anal Mach Intell 13(10):1050–1065

    Article  Google Scholar 

  108. Strat T, Fua P, Connolly C (1997) Context-based vision. In: Radius: image understanding for imagery intelligence, pp 373–388

  109. Thorpe S, Fize D, Marlot C (1996) Speed of processing in the human visual system. Nature 381(6582):520–522

    Article  Google Scholar 

  110. Torralba A (2003) Contextual priming for object detection. Int J Comput Vis 53:169–191

    Article  Google Scholar 

  111. Torralba A (2003) Modeling global scene factors in attention. J Opt Soc Am A 20(7):1407–1418

    Article  Google Scholar 

  112. Torralba A, Murphy KP, Freeman W (2004) Contextual models for object detection using boosted random fields. In Saul LK, Weiss Y, Bottou L (eds) Advances in neural information processing systems (NIPS). MIT Press, Cambridge, MA, pp 1401–1408

    Google Scholar 

  113. Torralba A, Murphy KP, Freeman WT (2010) Using the forest to see the trees: exploiting context for visual object detection and localization. Commun ACM 53(3):107–114

    Article  Google Scholar 

  114. TorralbaA,Murphy KP, Freeman WT, RubinMA(2003) Context-based vision system for place and object recognition. In: Proc ninth IEEE int computer vision conf, pp 273–280

  115. Torralba A, Oliva A (2003) Statistics of natural image categories. Network 14(3):391–412

    Article  Google Scholar 

  116. Torralba A, Oliva A, Castelhano MS, Henderson JM (2006) Contextual guidance of eye movements and attention in real-world scenes: the role of global features in object search. Psychol Rev 113(4):766–786

    Article  Google Scholar 

  117. Torralba A, Sinha P (2001) Statistical context priming for object detection. In: Computer vision. ICCV 2001. Proceedings eighth IEEE international conference on, pp 763–770

  118. Treisman AM, Gelade G (1980) A feature-integration theory of attention. Cogn Psychol 12(1):97–136

    Article  Google Scholar 

  119. Viola P, Jones MJ (2004) Robust real-time face detection. Int J Comput Vision 57(2):137–154

    Article  Google Scholar 

  120. Zheng W-S, Gong S, Xiang T (2009) Quantifying contextual information for object detection. In: Computer vision, 2009 IEEE 12th international conference on, pp 932–939

  121. Wang X, Doretto G, Sebastian T, Rittscher J, Tu PH (2007) Shape and appearance context modeling. In: IEEE 11th international conference on computer vision (ICCV) 2007, 14–21 Oct 2007. Rio de Janeiro, Brazil, pp 1–8

    Google Scholar 

  122. Wolf L, Bileschi S (2006) A critical view of context. Int J Comput Vision 69(2):251–261

    Article  Google Scholar 

  123. Xiao J, Hays J, Ehinger K, Oliva A, Torralba A (2010) SUN database: large-scale scene recognition from abbey to zoo. In: Computer vision and pattern recognition (CVPR), 2010 IEEE conference on, pp 3485–3492

  124. Yakimovsky Y, Feldman JA (1973) A semantics-based decision theory region analyzer. In: IJCAI’73: Proceedings of the 3rd international joint conference on artificial intelligence. Morgan Kaufmann, San Francisco, pp 580–588

    Google Scholar 

  125. Yang Y, Hallman S, Ramanan D, Fowlkes C (2010) Layered object detection for multi-class segmentation. In: Computer vision and pattern recognition (CVPR), 2010 IEEE conference on, pp 3113–3120

  126. Yang YH, Wu PT, Lee CW, Lin KH, Hsu WH, Chen HH (2008) Contextseer: context search and recommendation at query time for shared consumer photos. In: MM ’08: Proceeding of the 16th ACM international conference on multimedia. ACM, New York, pp 199–208

    Chapter  Google Scholar 

  127. Yantis S (1993) Stimulus-driven attentional capture and attentional control settings. J Exp Psychol Hum Percept Perform 19(3):676–681

    Article  Google Scholar 

  128. Yantis S, Jonides J (1990) Abrupt visual onsets and selective attention: voluntary versus automatic allocation. J Exp Psychol Hum Percept Perform 16(1):121–134

    Article  Google Scholar 

Download references

Acknowledgements

The authors would like to thank Geraldine Morin, Pierre Gurdjos, Viorica Patraucean, and Jerôme Guenard, for the insightful discussions and constructive suggestions.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Oge Marques.

Resources for researchers in the field

Resources for researchers in the field

In this appendix we provide a brief survey of some of the most prominent research groups working on context modeling in computer vision and associated topics, as well as associated resources, such as datasets and open-source code.

1.1 Research groups

1.1.1 Torralba, Oliva, et al.

The work by Torralba,Footnote 2 OlivaFootnote 3 and colleagues at MIT is among the most representative efforts in combining human behavioral and computational research on topics related to the broad themes of “visual scene understanding” and “object recognition”.

The paper by Oliva and Torralba [79] in which they introduce the spatial envelope as a global descriptor capable of capturing the ‘gist’ of a scene and demonstrate that it can be used to distinguish among eight different categories of natural scenes has sparked tremendous interest in the computer vision research community and can be considered the seminal reference at the beginning of the most recent wave of work on the topic of context modeling. Other essential papers include [45, 53, 80, 81, 95, 98, 110116].

Many project-related resources are also available online, including:

1.1.2 Efros, Hoiem et al.

The research by Efros,Footnote 4 Hoiem (currently at University of Illinois at Urbana-Champaign)Footnote 5 and colleagues at Carnegie Mellon University has led to some of the most influential practical applications of the latest computer vision techniques to the problem of modeling and reconstructing 3D scenes and using contextual information to improve the performance of object detection and recognition solutions.

Essential reading include [5560]. Many project-related resources are also available, including:

1.1.3 The Make3D team

The Make 3D project is another approach to learn depth and infer 3D model from a single image. It was created by Saxena, Ng and their colleagues from Stanford 3D reconstruction group [97]. Code and range image data for Make3D are available at http://make3d.cs.cornell.edu/code.html.

1.1.4 Heitz and Koller

Heitz and Koller (Stanford University) are the proponents of the “Things and stuff” (TAS) context model discussed earlier (Sections 3.2.2 and 4.3.2). Code and image data associated with their work can be found at: http://ai.stanford.edu/ gaheitz/Research/TAS/.

1.1.5 Grauman et al.

The work by GraumanFootnote 6 and colleagues at University of Texas at Austin covers a broad range of topics, from “learning and recognizing visual object categories” to “scalable methods for content-based retrieval and visual search”.

Project-related resources include:

1.1.6 Belongie, Rabinovich, Galleguillos, et al.

The work by Belongie, Rabinovich, Galleguillos and colleagues at the Computer Vision Laboratory in the Computer Science and Engineering Department at U.C. San Diego has produced significant impact on the state of the art in the topic of “Context Based Object Categorization”. Recommended reading include [39, 41, 92, 93.

1.1.7 Leonardis et al.

LeonardisFootnote 7 and colleagues at the Visual Cognitive Systems Laboratory at the University of Ljubljana have been working in the field of context-aware object detection. For demonstration videos associated with the projects reported in [87], visit: http://vicos.fri.uni-lj.si/roli/research/context-aware-object-detection/. Three urban image datasets (Ljubljana, Graz, and Darmstadt) can be downloaded from: http://vicos.fri.uni-lj.si/downloads/.

1.2 Datasets

Experimental research on computational approaches for contextual modeling, object, and scene recognition, should be extensively tested using representative images. In recent years, several image collections have been made publicly available to the research community, which brings many advantages such as time savings (capturing, selecting, organizing, and annotating images is a very time consuming process) and the ability to compare (i.e., benchmark) a new algorithm or framework against previous approaches in the literature. In addition to proprietary image and video collections, several recent experiments in this field have used one of the following publicly available datasets:

  • PASCAL Visual Object Classes (VOC) dataset [31] http://pascallin.ecs.soton.ac.uk/challenges/VOC/

    The PASCAL VOC dataset consists of consumer photographs collected from Flickr and associated ground truth annotation (including coordinates of the rectangular areas delimiting an object of interest). The images are used in the context of two principal challenges: object classification and object detection. New datasets have been released each year since 2006. The images are organized into 20 classes as follows:

    • Person: person

    • Animal: bird, cat, cow, dog, horse, sheep

    • Vehicle: airplane, bicycle, boat, bus, car, motorbike, train

    • Indoor: bottle, chair, dining table, potted plant, sofa, TV/monitor

    The PASCAL VOC dataset has been used by Divvala and colleagues [27] and Heitz and Koller [51], among others.

    In spite of its great popularity and usefulness, the PASCAL dataset has been deemed not suitable for experiments with context-based object recognition algorithms by Choi et al. [18], because most images contain very few instances of a single object category (more than 50% of the images contain only a single object class) and also because objects’ bounding boxes occupy a large portion (typically 20%) of the image.

  • LabelMe dataset [96] http://labelme.csail.mit.edu/

    The LabelMe dataset consists of an ever-growing collection of 180,000+ images and associated annotations, contributed by its users in a collaborative way. The images—and associated MATLAB code to process, query, and annotate them—are publicly available and cover a wide range of topics and scenarios. One of the main criticisms of the LabelMe dataset refers to the fact that the dataset is incompletely labeled, since volunteer annotators are free to choose which objects to annotate, and which to omit, leading to difficulties in establishing precision and recall for detection and classification tasks [31]. Consequently, researchers interested in using LabelMe for their experimental evaluations typically adopt selected subsets of the database to use for training and testing, and ensure that these subsets are completely annotated. Subsets of the LabelMe dataset have been used by Oliva and Torralba [81], among others.

  • SUN 09 dataset [18] http://web.mit.edu/∼myungjin/www/HContext.html

    A few months ago, a new dataset was proposed and made available to the research community: the SUN 09 dataset, which contains 12,000 annotated images covering a large number of scene categories (indoor and outdoors) with more than 200 object categories and 152,000 annotated object instances. SUN 09 contains images collected from multiple sources (Google, Flickr, Altavista, LabelMe) and does not include images of objects on white backgrounds or close-ups, i.e., images in which there is no significant context information. It has been annotated using LabelMe [96] by a single annotator and verified for consistency [18].

    In the SUN 09 dataset, the average object size is 5% of the image size, and a typical image contains seven different object categories. In their evaluation, Choi et al. demonstrate that SUN 09 contains richer contextual information when compared to PASCAL VOC 2007, using the same 20 categories. They also demonstrate that the contextual information learned from SUN 09 significantly improves the accuracy of object recognition tasks, and can even be used to identify out-of-context (e.g., due to wrong pose, scale, or co-occurrence) scenes [18].

  • SUN dataset [123] http://groups.csail.mit.edu/vision/SUN/

    The Scene UNderstanding (SUN) dataset was introduced earlier this year and is targeted at research in scene classification, which has been customarily tested on a fairly small (usually, 15 or less) number of semantic categories. The SUN dataset contains 899 categories and 130,519 images. The number of images varies across categories, but there are at least 100 images per category. Out of the 899 categories, in [123], the authors use 397 well-sampled categories to evaluate numerous state-of-the-art algorithms for scene recognition and establish new bounds of performance. All images, associated code, as well as training and testing partitions are available for download at the URL indicated above.

  • NUS-WIDE [21] http://lms.comp.nus.edu.sg/research/NUS-WIDE.htm

    The NUS-WIDE is a publicly accessible image dataset created by the Lab for Media Search at National University of Singapore (NUS). The dataset includes:

    1. 1.

      269,648 images and the associated tags from Flickr, with a total number of 5,018 unique tags;

    2. 2.

      six types of low-level features extracted from these images, including 64-D color histogram, 144-D color correlogram, 73-D edge direction histogram, 128-D wavelet texture, 225-D block-wise color moments and 500-D bag of words based on SIFT descriptions; and

    3. 3.

      ground-truth for 81 concepts that can be used for evaluation.

Additionally, there are several web pages with lists of links to useful datasets for computer vision, including:

Rights and permissions

Reprints and permissions

About this article

Cite this article

Marques, O., Barenholtz, E. & Charvillat, V. Context modeling in computer vision: techniques, implications, and applications. Multimed Tools Appl 51, 303–339 (2011). https://doi.org/10.1007/s11042-010-0631-y

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11042-010-0631-y

Keywords

Navigation