Abstract
In recent years there has been a surge of interest in context modeling for numerous applications in computer vision. The basic motivation behind these diverse efforts is generally the same—attempting to enhance current image analysis technologies by incorporating information from outside the target object, including scene analysis as well as metadata. However, many different approaches and applications have been proposed, leading to a somewhat inchoate literature that can be difficult to navigate. The current paper provides a ‘roadmap’ of this new research, including a discussion of the basic motivation behind context-modeling, an overview of the most representative techniques, and a discussion of specific applications in which contextual modeling has been incorporated. This review is intended to introduce researchers in computer vision and image analysis to this increasingly important field as well as provide a reference for those who may wish to incorporate context modeling in their own work.
Similar content being viewed by others
References
Alvarez G, Oliva A (2008) The representation of simple ensemble visual features outside the focus of attention. Psychol Sci 19(4):392–398
Amores J, Radeva P (2005) Registration and retrieval of highly elastic bodies using contextual information. Pattern Recogn Lett 26(11):1720–1731
Amores J, Sebe N, Radeva P (2007) Context-based object-class recognition and retrieval by generalized correlograms. IEEE Trans Pattern Anal Mach Intell 29(10):1818–1833
Ariely D (2001) Seeing sets: representation by statistical properties. Psychol Sci 12(2):157–162
Auckland M (2007) Non-target objects can influence perceptual processes during object recognition. Psychon Bull Rev 14:332–337
Bar M (2004) Visual objects in context. Nat Rev Neurosci 5(8):617–629
Bar M, Aminoff E (2003) Cortical analysis of visual context. Neuron 38(2):347–358
Bar M, Ullman S (1996) Spatial context in recognition. Perception 25(3):343–352
Barenholtz E (2009) Quantifying the role of context in visual object recognition [abstract]. J Vis 9(8):800, 800a
Belongie S, Malik J, Puzicha J (2002) Shape matching and object recognition using shape contexts. IEEE Trans Pattern Anal Mach Intell 24:509–522
Biederman I, Mezzanote R, Rabinovitz J (1982) Scene perception: detecting and judging objects undergoing relational violations. Cogn Psychol 14:143–147
Biederman I, Rabinowitz JC, Glass AL, Stacy EW (1974) On the information extracted from a glance at a scene. J Exp Psychol 103(3):597–600
Boyce JS, Pollatsek A, Rayner K (1998) Effect of background information on object identification. J Exp Psychol Hum Percept Perform 15(3):556–566
Brockmole JR, Castelhano MS, Henderson JM (2006) Contextual cueing in naturalistic scenes: global and local contexts. J Exp Psychol Learn Mem Cogn 32(4):699–706
Brockmole JR, Hambrick DZ, Windisch DJ, Henderson JM (2008) The role of meaning in contextual cueing: evidence from chess expertise. Q J Exp Psychol (Colchester) 61(12):1886–1896
Cao L, Luo J, Kautz H, Huang T (2009) Image annotation within the context of personal photo collections using hierarchical event and scene models. IEEE Trans Multimedia 11(2):208–219
Carbonetto P, de Freitas N, Barnard K (2004) A statistical model for general contextual object recognition. In: European conference on computer vision (ECCV), pp 350–362
Choi MJ, Lim J, Torralba A, Willsky A (2010) Exploiting hierarchical context on a large database of object categories. In: Computer vision and pattern recognition (CVPR), 2010 IEEE conference on, pp 129–136
Chong SC, Treisman A (2005) Attentional spread in the statistical processing of visual displays. Percept Psychophys 67(1):1–13
Chong SC, Treisman A (2005) Statistical processing: computing the average size in perceptual groups. Vis Res 45(7):891–900
Chua T-S, Tang J, Hong R, Li H, Luo Z, Zheng Y-T (2009) Nus-wide: a real-world web image database from National University of Singapore. In: Proc. of ACM conf. on image and video retrieval (CIVR’09). Santorini, Greece
Chun M, Jiang Y (1999) Top-down attentional guidance based on implicit learning of visual covariation. Psychol Sci 10:360–365
Chun M, Jiang Y (2003) Implicit, long-term spatial contextual memory. Percept Psychophys 65:72–80
Chun MM, Jiang Y (1998) Contextual cueing: implicit learning and memory of visual context guides spatial attention. Cogn Psychol 36(1):28–71
Cox D, Meyers E, Sinha P (2004) Contextually evoked object-specific responses in human visual cortex. Science 304:115–117
Davenport JL, Potter MC (2004) Scene consistency in object and background perception. Psychol Sci 15(8):559–564
Divvala S, Hoiem D, Hays J, Efros A, Hebert M (2009) An empirical study of context in object detection. In: Computer vision and pattern recognition recognition, CVPR 2009. IEEE conference on, pp 1271–1278
Endo N, Takeda Y (2005) Use of spatial context is restricted by relative position in implicit learning. Psychon Bull Rev 12(5):880–885
Epstein R, Harris A, Stanley D, Kanwisher N (1999) The parahippocampal place area: recognition, navigation, or encoding? Neuron 23(1):115–125
Epstein R, Kanwisher N (1998) A cortical representation of the local visual environment. Nature 392:598–601
Everingham M, Van Gool L, Williams CKI, Winn J, Zisserman A (2010) The Pascal Visual Object Classes (VOC) challenge. Int J Comput Vis 88(2):303–338
Fei-Fei L, Iyer A, Koch C, Perona P (2007) What do we perceive in a glance of a real-world scene? J Vis 7(1):10
Felzenszwalb P, Girshick R, McAllester D (2010) Cascade object detection with deformable part models. In: Computer vision and pattern recognition (CVPR), 2010 IEEE conference on, pp 2241–2248
Felzenszwalb PF, Huttenlocher DP (2004) Efficient graph-based image segmentation. Int J Comput Vis 59:2004
Ferrari V, Jurie F, Schmid C (2010) From images to shape models for object detection. Int J Comput Vis 87(3):284–303
Fink M, Perona P (2003) Mutual boosting for contextual inference. In: Thrun S, Saul L, Schökopf B (eds) Advances in neural information processing systems (NIPS). MIT Press, Cambridge, MA
Fischler MA, Strat TM (1989) Recognizing objects in a natural environment: a contextual vision system (cvs). In: Proceedings of a workshop on image understanding workshop. San Francisco, CA, USA. Morgan Kaufmann, pp 774–796
Forsyth D, Malik J, Fleck M, Greenspan H, Leung T, Belongie S, Carson C, Bregler C (1996) Finding pictures of objects in large collections of images. Technical report, UC Berkeley, Berkeley, CA, USA
Galleguillos C, Belongie S (2010) Context based object categorization: a critical survey. Comput Vis Image Underst (CVIU) 114:712–722
Galleguillos C, McFee B, Belongie S, Lanckriet GRG (2010) Multi-class object localization by combining local contextual interactions. In: IEEE conference in computer vision and patter recognition (CVPR)
Galleguillos C, Rabinovich A, Belongie S (2008) Object categorization using co-occurrence, location and appearance. In: Proc. IEEE conf. computer vision and pattern recognition (CVPR), pp 1–8
Goujon A, Didierjean A, Marmèche E (2007) Contextual cueing based on specific and categorical properties of the environment. Vis Cogn 15:257–275
Goujon A, Didierjean A, Marmèche E (2009) Semantic contextual cuing and visual attention. J Exp Psychol Hum Percept Perform 35(1):50–71
Graef PD, Troy AD, D’Ydewalle G (1992) Local and global contextual constraints on the identification of objects in scenes. Can J Psychol 46(3):489–508
Greene M, Oliva A (2009) Recognition of natural scenes from global properties: seeing the forest without representing the trees. Cogn Psychol 58:137–176
Gronau N, Neta M, Bar M (2008) Integrated contextual representation for objects’ identities and their locations. J Cogn Neurosci 20(3):371–388
Gupta A, Efros AA, Hebert M (2010) Blocks world revisited: image understanding using qualitative geometry and mechanics, in ECCV
Hays J, Efros A (2008) IM2GPS: estimating geographic information from a single image. In: Computer vision and pattern recognition, CVPR 2008. IEEE conference on, pp 1–8
Hedau V, Hoiem D, Forsyth D (2009) Recovering the spatial layout of cluttered rooms. In: Computer vision, 2009 IEEE 12th international conference on, pp 1849–1856
Hedau V, Hoiem D, Forsyth D (2010) Thinking inside the box: using appearance models and context based on room geometry. In: Daniilidis K, Maragos P, Paragios N (eds) Computer vision – ECCV 2010. Springer Berlin, Heidelberg, pp 224–237
Heitz G, Koller D (2008) Learning spatial context: using stuff to find things. In: ECCV ’08: Proceedings of the 10th European conference on computer vision. Springer, Berlin, pp 30–43
Henderson JM, Hollingworth A (1999) High-level scene perception. Annu Rev Psychol 50:243–271
Hidalgo-Sotelo B, Oliva A, Torralba A (2005) Human learning of contextual priors for object search: where does the time go? In: Proceedings of the IEEE Computer Society conference on computer vision and pattern recognition, vol 3, pp 86–93
Hock H (1974) Contextual relations: the influence of familiarity, physical plausibility, and belongingness. Percept Psychophys 16:4–8
Hoiem D, Efros A, Hebert M (2008) Closing the loop in scene interpretation. In: Computer vision and pattern recognition, CVPR 2008. IEEE conference on, pp 1–8
Hoiem D, Efros AA, Hebert M (2005) Automatic photo pop-up. ACM Trans Graph (SIGGRAPH 2005) 24(3):577–584
Hoiem D, Efros AA, Hebert M (2005) Geometric context from a single image. In: ICCV ’05: Proceedings of the tenth IEEE international conference on computer vision (ICCV’05), vol 1. IEEE Computer Society, Washington, pp 654–661
Hoiem D, Efros AA, Hebert M (2007) Recovering surface layout from an image. Int J Comput Vis 75(1):151–172
Hoiem D, Efros AA, Hebert M (2008) Putting objects in perspective. Int J Comput Vis 80(1):3–15
Hoiem D, Stein A, Efros A, Hebert M (2007) Recovering occlusion boundaries from a single image. In: Computer vision, ICCV 2007. IEEE 11th international conference on, pp 1–8
Hollingworth A, Henderson JM (1998) Does consistent scene context facilitate object perception? J Exp Psychol Gen 127(4):398–415
Huang J, Kumar S, Mitra M, Zhu W, Zabih R (1997) Image indexing using color correlograms. In: Proc. CVPR, pp 762–768
Hwang SJ, Grauman K (2010) Reading between the lines: object localization using implicit cues from image tags. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR). San Francisco, CA
Itti L, Koch C (2001) Computational modelling of visual attention. Nat Rev Neurosci 2(3):194–203
Joshi D, Luo J (2008) Inferring generic activities and events from image content and bags of geo-tags. In: CIVR ’08: Proceedings of the 2008 international conference on content-based image and video retrieval. ACM, New York, pp 37–46
Kennedy LS, Naaman M (2008) Generating diverse and representative image search results for landmarks. In: WWW ’08: Proceeding of the 17th international conference on World Wide Web. ACM, New York, pp 297–306
Koch C, Ullman S (1985) Shifts in selective visual attention: towards the underlying neural circuitry. Hum Neurobiol 4(4):219–227
Kumar M, Torr P, Zisserman A (2005) OBJ CUT. In: Computer vision and pattern recognition, CVPR 2005. IEEE computer society conference on, vol 1, pp 18–25
Kumar S, Hebert M (2005) A hierarchical field framework for unified context-based classification. In: ICCV ’05: Proceedings of the tenth IEEE international conference on computer vision. IEEE Computer Society, Washington, pp 1284–1291
Kunar M (2007) Does contextual cueing guide the deployment of attention? J Exp Psychol Hum Percept Perform 33:816–828
Lee YJ, Grauman K (2010) Object-graphs for context-aware category discovery. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR). San Francisco, CA
Leordeanu M, Hebert M, Sukthankar R (2007) Beyond local appearance: category recognition from pairwise interactions of simple features. In: Computer vision and pattern recognition, CVPR ’07. IEEE conference on, pp 1–8
Luo J, Yu J, Joshi D, Hao W (2008) Event recognition: viewing the world with a third eye. In: MM ’08: Proceeding of the 16th ACM international conference on multimedia. ACM, New York, pp 1071–1080
Marr D (1982) Vision. W. H. Freeman, San Francisco
Modestino J, Zhang J (1992) A Markov random field model-based approach to image interpretation. IEEE Transactions on Pattern Anal Mach Intell 14:606–615
Navon D (1977) Forest before trees: the precedence of global features in visual perception. Cogn Psychol 9:353–383
O’Hare N, Lee H, Cooray S, Gurrin C, Jones G, Malobabic J, O’Connor N, Smeaton AF, Uscilowski B (2006) Mediassist: using content-based analysis and context to manage personal photo collections. In: 5th int. conf. on image and video retrieval. Tempe, AZ, pp 529–532
O’Hare N, Smeaton A (2009) Context-aware person identification in personal photo collections. IEEE Trans Multimedia 11(2):220–228
Oliva A, Torralba A (2001) Modeling the shape of the scene: a holistic representation of the spatial envelope. Int J Comput Vis 42(3):145–175
Oliva A, Torralba A (2006) Building the gist of a scene: the role of global image features in recognition. Prog Brain Res 155:23–36
Oliva A, Torralba A (2007) The role of context in object recognition. Trends Cogn Sci 11(12):520–527
Opelt A, Pinz A, Zisserman A (2006) A boundary-fragment-model for object detection. In: Leonardis A, Bischof H, Pinz A (eds) Computer vision – ECCV 2006. Springer Berlin, Heidelberg, pp 575–588
Opelt A, Pinz A, Zisserman A (2006) Incremental learning of object detectors using a visual shape alphabet. In: Computer vision and pattern recognition, 2006 IEEE computer society conference on, pp 3–10
Opelt A, Pinz A, Zisserman A (2008) Learning an alphabet of shape and appearance for multi-class object detection. Int J Comput Vis 80(1):16–44
Palmer S (1975) The effects of contextual scenes on the identification of objects. Mem Cogn 3:519–526
Peissig J, Tarr M (2007) Visual object recognition: do we know more now than we did 20 years ago? Annu Rev Psychol 50:75–96
Perko R, Leornardis A (2010) A framework for visual-context-aware object detection in still images. Comput Vis Image Underst (CVIU) 114:700–711
Posner MI (1980) Orienting of attention. Q J Exp Psychol 32(1):3–25
Potter MC (1976) Short-term conceptual memory for pictures. J Exp Psychol Hum Learn 2(5):509–522
Potter MC, Faulconer BA (1975) Time to understand pictures and words. Nature 253(5491):437–438
Quattoni A, Torralba A (2009) Recognizing indoor scenes. In: Computer vision and pattern recognition, CVPR 2009. IEEE conference on, pp 413–420
Rabinovich A, Belongie S (2009) Scenes vs. objects: a comparative study of two approaches to context based recognition. In: Computer vision and pattern recognition workshops. CVPR workshops 2009. IEEE computer society conference on, pp 92–99
Rabinovich A, Vedaldi A, Galleguillos C, Wiewiora E, Belongie S (2007) Objects in context. In: Computer vision. ICCV 2007. IEEE 11th international conference on, pp 1–8
Rieger JW, Köchy N, Schalk F, Grüschow M, Heinze H-J (2008) Speed limits: orientation and semantic context interactions constrain natural scene discrimination dynamics. J Exp Psychol Hum Percept Perform 34(1):56–76
Russell B, Torralba A, Liu C, Fergus R, Freeman W (2007) Object recognition by scene alignment. In: Platt JC, Koller D, Singer Y, Roweis S (eds) Advances in neural information processing systems (NIPS). MIT Press, Cambridge, MA, pp 1241–1248
Russell BC, Torralba A, Murphy KP, Freeman WT (2008) Labelme: a database and web-based tool for image annotation. Int J Comput Vis 77:157–173
Saxena A, Sun M, Ng AY (2009) Make3d: learning 3d scene structure from a single still image. IEEE Trans Pattern Anal Mach Intell 31(5):824–840
Schyns P, Oliva A (1994) From blobs to boundary edges: evidence for time- and spatial-scale-dependent scene recognition. Psychol Sci 5:195–200
Selfridge OG (1955) Pattern recognition and modern computers. In: Proceedings of the western joint computer conference. IEEE, New York
Siagian C, Itti L (2007) Rapid biologically-inspired scene classification using features shared with visual attention. IEEE Trans Pattern Anal Mach Intell 29(2):300–312
Singhal A, Luo J, Zhu W (2003) Probabilistic spatial context models for scene content understanding. In: Computer vision and pattern recognition. Proceedings, 2003 IEEE computer society conference on, vol 1, pp I-235–I-241
Smeulders A, Worring M, Santini S, Gupta A, Jain R (2000) Content-based image retrieval at the end of the early years. IEEE Trans Pattern Anal Mach Intell 22(12):1349–1380
Strat T (1992) Natural object recognition. Springer-Verlag New York, Inc., New York, NY, USA
Strat T (1993) Employing contextual information in computer vision. In: Proceedings of ARPA image understanding workshop, pp 217–229
Strat T, FischlerM(1989) Context-based vision: recognition of natural scenes. In: Twenty-third asilomar conference on signals, systems and computers, pp 532–536
Strat T, Fischler M (1990) A context-based recognition system for natural scenes and complex domains. In: DARPA image understanding workshop, pp 456–472
Strat T, Fischler M (1991) Context-based vision: recognizing objects using information from both 2-d and 3-d imagery. IEEE Trans Pattern Anal Mach Intell 13(10):1050–1065
Strat T, Fua P, Connolly C (1997) Context-based vision. In: Radius: image understanding for imagery intelligence, pp 373–388
Thorpe S, Fize D, Marlot C (1996) Speed of processing in the human visual system. Nature 381(6582):520–522
Torralba A (2003) Contextual priming for object detection. Int J Comput Vis 53:169–191
Torralba A (2003) Modeling global scene factors in attention. J Opt Soc Am A 20(7):1407–1418
Torralba A, Murphy KP, Freeman W (2004) Contextual models for object detection using boosted random fields. In Saul LK, Weiss Y, Bottou L (eds) Advances in neural information processing systems (NIPS). MIT Press, Cambridge, MA, pp 1401–1408
Torralba A, Murphy KP, Freeman WT (2010) Using the forest to see the trees: exploiting context for visual object detection and localization. Commun ACM 53(3):107–114
TorralbaA,Murphy KP, Freeman WT, RubinMA(2003) Context-based vision system for place and object recognition. In: Proc ninth IEEE int computer vision conf, pp 273–280
Torralba A, Oliva A (2003) Statistics of natural image categories. Network 14(3):391–412
Torralba A, Oliva A, Castelhano MS, Henderson JM (2006) Contextual guidance of eye movements and attention in real-world scenes: the role of global features in object search. Psychol Rev 113(4):766–786
Torralba A, Sinha P (2001) Statistical context priming for object detection. In: Computer vision. ICCV 2001. Proceedings eighth IEEE international conference on, pp 763–770
Treisman AM, Gelade G (1980) A feature-integration theory of attention. Cogn Psychol 12(1):97–136
Viola P, Jones MJ (2004) Robust real-time face detection. Int J Comput Vision 57(2):137–154
Zheng W-S, Gong S, Xiang T (2009) Quantifying contextual information for object detection. In: Computer vision, 2009 IEEE 12th international conference on, pp 932–939
Wang X, Doretto G, Sebastian T, Rittscher J, Tu PH (2007) Shape and appearance context modeling. In: IEEE 11th international conference on computer vision (ICCV) 2007, 14–21 Oct 2007. Rio de Janeiro, Brazil, pp 1–8
Wolf L, Bileschi S (2006) A critical view of context. Int J Comput Vision 69(2):251–261
Xiao J, Hays J, Ehinger K, Oliva A, Torralba A (2010) SUN database: large-scale scene recognition from abbey to zoo. In: Computer vision and pattern recognition (CVPR), 2010 IEEE conference on, pp 3485–3492
Yakimovsky Y, Feldman JA (1973) A semantics-based decision theory region analyzer. In: IJCAI’73: Proceedings of the 3rd international joint conference on artificial intelligence. Morgan Kaufmann, San Francisco, pp 580–588
Yang Y, Hallman S, Ramanan D, Fowlkes C (2010) Layered object detection for multi-class segmentation. In: Computer vision and pattern recognition (CVPR), 2010 IEEE conference on, pp 3113–3120
Yang YH, Wu PT, Lee CW, Lin KH, Hsu WH, Chen HH (2008) Contextseer: context search and recommendation at query time for shared consumer photos. In: MM ’08: Proceeding of the 16th ACM international conference on multimedia. ACM, New York, pp 199–208
Yantis S (1993) Stimulus-driven attentional capture and attentional control settings. J Exp Psychol Hum Percept Perform 19(3):676–681
Yantis S, Jonides J (1990) Abrupt visual onsets and selective attention: voluntary versus automatic allocation. J Exp Psychol Hum Percept Perform 16(1):121–134
Acknowledgements
The authors would like to thank Geraldine Morin, Pierre Gurdjos, Viorica Patraucean, and Jerôme Guenard, for the insightful discussions and constructive suggestions.
Author information
Authors and Affiliations
Corresponding author
Resources for researchers in the field
Resources for researchers in the field
In this appendix we provide a brief survey of some of the most prominent research groups working on context modeling in computer vision and associated topics, as well as associated resources, such as datasets and open-source code.
1.1 Research groups
1.1.1 Torralba, Oliva, et al.
The work by Torralba,Footnote 2 OlivaFootnote 3 and colleagues at MIT is among the most representative efforts in combining human behavioral and computational research on topics related to the broad themes of “visual scene understanding” and “object recognition”.
The paper by Oliva and Torralba [79] in which they introduce the spatial envelope as a global descriptor capable of capturing the ‘gist’ of a scene and demonstrate that it can be used to distinguish among eight different categories of natural scenes has sparked tremendous interest in the computer vision research community and can be considered the seminal reference at the beginning of the most recent wave of work on the topic of context modeling. Other essential papers include [45, 53, 80, 81, 95, 98, 110–116].
Many project-related resources are also available online, including:
-
MATLAB code, datasets, and examples of results for the “spatial envelope” scene representation [79]: http://people.csail.mit.edu/torralba/code/spatialenvelope/. Images for each of the eight scene categories can also be downloaded from http://cvcl.mit.edu/database.htm.
-
Datasets and examples of results for the “Place and scene recognition from video” project [114]: http://www.cs.ubc.ca/∼murphyk/Vision/placeRecognition.html.
-
Datasets, tools, and examples of results for contextual priming for object detection [117]: http://web.mit.edu/torralba/www/carsAndFacesInContext.html.
-
MATLAB code, datasets, tools, and examples of results for contextual guidance of eye movements and attention in real-world scenes [116]: http://people.csail.mit.edu/torralba/GlobalFeaturesAndAttention/.
1.1.2 Efros, Hoiem et al.
The research by Efros,Footnote 4 Hoiem (currently at University of Illinois at Urbana-Champaign)Footnote 5 and colleagues at Carnegie Mellon University has led to some of the most influential practical applications of the latest computer vision techniques to the problem of modeling and reconstructing 3D scenes and using contextual information to improve the performance of object detection and recognition solutions.
Essential reading include [55–60]. Many project-related resources are also available, including:
-
MATLAB code, datasets, and examples of results for the “Automatic Photo Pop-up” project [56]: http://www.cs.uiuc.edu/homes/dhoiem/projects/popup/index.html
-
MATLAB code, datasets, and examples of results for the “Surface Context” project [57, 58]: http://www.cs.uiuc.edu/homes/dhoiem/projects/context/index.html
-
Dataset for the “Putting Objects in Perspective” project [59]: http://www.cs.uiuc.edu/homes/dhoiem/projects/pop/index.html
-
Code and ground-truth data for the “Recovering the Spatial Layout of Cluttered Rooms” project [49]: https://netfiles.uiuc.edu/vhedau2/www/Research/research_spatialLayout.html
1.1.3 The Make3D team
The Make 3D project is another approach to learn depth and infer 3D model from a single image. It was created by Saxena, Ng and their colleagues from Stanford 3D reconstruction group [97]. Code and range image data for Make3D are available at http://make3d.cs.cornell.edu/code.html.
1.1.4 Heitz and Koller
Heitz and Koller (Stanford University) are the proponents of the “Things and stuff” (TAS) context model discussed earlier (Sections 3.2.2 and 4.3.2). Code and image data associated with their work can be found at: http://ai.stanford.edu/ gaheitz/Research/TAS/.
1.1.5 Grauman et al.
The work by GraumanFootnote 6 and colleagues at University of Texas at Austin covers a broad range of topics, from “learning and recognizing visual object categories” to “scalable methods for content-based retrieval and visual search”.
Project-related resources include:
-
Datasets and examples of results for the “Reading Between The Lines: Object Localization Using Implicit Cues from Image Tags” project [63]: http://userweb.cs.utexas.edu/∼sjhwang/tags.html
-
MATLAB code, supplementary materials, and examples of results for the “Object-Graphs for Context-Aware Category Discovery” project [71]: http://userweb.cs.utexas.edu/∼grauman/research/projects/objectgraph/objectgraph.html
1.1.6 Belongie, Rabinovich, Galleguillos, et al.
The work by Belongie, Rabinovich, Galleguillos and colleagues at the Computer Vision Laboratory in the Computer Science and Engineering Department at U.C. San Diego has produced significant impact on the state of the art in the topic of “Context Based Object Categorization”. Recommended reading include [39, 41, 92, 93.
1.1.7 Leonardis et al.
LeonardisFootnote 7 and colleagues at the Visual Cognitive Systems Laboratory at the University of Ljubljana have been working in the field of context-aware object detection. For demonstration videos associated with the projects reported in [87], visit: http://vicos.fri.uni-lj.si/roli/research/context-aware-object-detection/. Three urban image datasets (Ljubljana, Graz, and Darmstadt) can be downloaded from: http://vicos.fri.uni-lj.si/downloads/.
1.2 Datasets
Experimental research on computational approaches for contextual modeling, object, and scene recognition, should be extensively tested using representative images. In recent years, several image collections have been made publicly available to the research community, which brings many advantages such as time savings (capturing, selecting, organizing, and annotating images is a very time consuming process) and the ability to compare (i.e., benchmark) a new algorithm or framework against previous approaches in the literature. In addition to proprietary image and video collections, several recent experiments in this field have used one of the following publicly available datasets:
-
PASCAL Visual Object Classes (VOC) dataset [31] http://pascallin.ecs.soton.ac.uk/challenges/VOC/
The PASCAL VOC dataset consists of consumer photographs collected from Flickr and associated ground truth annotation (including coordinates of the rectangular areas delimiting an object of interest). The images are used in the context of two principal challenges: object classification and object detection. New datasets have been released each year since 2006. The images are organized into 20 classes as follows:
-
Person: person
-
Animal: bird, cat, cow, dog, horse, sheep
-
Vehicle: airplane, bicycle, boat, bus, car, motorbike, train
-
Indoor: bottle, chair, dining table, potted plant, sofa, TV/monitor
The PASCAL VOC dataset has been used by Divvala and colleagues [27] and Heitz and Koller [51], among others.
In spite of its great popularity and usefulness, the PASCAL dataset has been deemed not suitable for experiments with context-based object recognition algorithms by Choi et al. [18], because most images contain very few instances of a single object category (more than 50% of the images contain only a single object class) and also because objects’ bounding boxes occupy a large portion (typically 20%) of the image.
-
-
LabelMe dataset [96] http://labelme.csail.mit.edu/
The LabelMe dataset consists of an ever-growing collection of 180,000+ images and associated annotations, contributed by its users in a collaborative way. The images—and associated MATLAB code to process, query, and annotate them—are publicly available and cover a wide range of topics and scenarios. One of the main criticisms of the LabelMe dataset refers to the fact that the dataset is incompletely labeled, since volunteer annotators are free to choose which objects to annotate, and which to omit, leading to difficulties in establishing precision and recall for detection and classification tasks [31]. Consequently, researchers interested in using LabelMe for their experimental evaluations typically adopt selected subsets of the database to use for training and testing, and ensure that these subsets are completely annotated. Subsets of the LabelMe dataset have been used by Oliva and Torralba [81], among others.
-
SUN 09 dataset [18] http://web.mit.edu/∼myungjin/www/HContext.html
A few months ago, a new dataset was proposed and made available to the research community: the SUN 09 dataset, which contains 12,000 annotated images covering a large number of scene categories (indoor and outdoors) with more than 200 object categories and 152,000 annotated object instances. SUN 09 contains images collected from multiple sources (Google, Flickr, Altavista, LabelMe) and does not include images of objects on white backgrounds or close-ups, i.e., images in which there is no significant context information. It has been annotated using LabelMe [96] by a single annotator and verified for consistency [18].
In the SUN 09 dataset, the average object size is 5% of the image size, and a typical image contains seven different object categories. In their evaluation, Choi et al. demonstrate that SUN 09 contains richer contextual information when compared to PASCAL VOC 2007, using the same 20 categories. They also demonstrate that the contextual information learned from SUN 09 significantly improves the accuracy of object recognition tasks, and can even be used to identify out-of-context (e.g., due to wrong pose, scale, or co-occurrence) scenes [18].
-
SUN dataset [123] http://groups.csail.mit.edu/vision/SUN/
The Scene UNderstanding (SUN) dataset was introduced earlier this year and is targeted at research in scene classification, which has been customarily tested on a fairly small (usually, 15 or less) number of semantic categories. The SUN dataset contains 899 categories and 130,519 images. The number of images varies across categories, but there are at least 100 images per category. Out of the 899 categories, in [123], the authors use 397 well-sampled categories to evaluate numerous state-of-the-art algorithms for scene recognition and establish new bounds of performance. All images, associated code, as well as training and testing partitions are available for download at the URL indicated above.
-
NUS-WIDE [21] http://lms.comp.nus.edu.sg/research/NUS-WIDE.htm
The NUS-WIDE is a publicly accessible image dataset created by the Lab for Media Search at National University of Singapore (NUS). The dataset includes:
-
1.
269,648 images and the associated tags from Flickr, with a total number of 5,018 unique tags;
-
2.
six types of low-level features extracted from these images, including 64-D color histogram, 144-D color correlogram, 73-D edge direction histogram, 128-D wavelet texture, 225-D block-wise color moments and 500-D bag of words based on SIFT descriptions; and
-
3.
ground-truth for 81 concepts that can be used for evaluation.
-
1.
Additionally, there are several web pages with lists of links to useful datasets for computer vision, including:
-
http://userweb.cs.utexas.edu/∼grauman/courses/spring2008/datasets.htm—maintained by Prof. Kristen Grauman (University of Texas at Austin)
-
http://www.cs.ubc.ca/∼murphyk/Vision/objectRecognitionDatabases.html—maintained by Prof. Kevin Murphy (University of British Columbia)
-
http://www.cs.cmu.edu/∼efros/courses/LBMV07/databases.htm—maintained by Prof. Alexei Efros (Carnegie Mellon University)
Rights and permissions
About this article
Cite this article
Marques, O., Barenholtz, E. & Charvillat, V. Context modeling in computer vision: techniques, implications, and applications. Multimed Tools Appl 51, 303–339 (2011). https://doi.org/10.1007/s11042-010-0631-y
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-010-0631-y