skip to main content
research-article

Object class detection: A survey

Authors Info & Claims
Published:11 July 2013Publication History
Skip Abstract Section

Abstract

Object class detection, also known as category-level object detection, has become one of the most focused areas in computer vision in the new century. This article attempts to provide a comprehensive survey of the recent technical achievements in this area of research. More than 270 major publications are included in this survey covering different aspects of the research, which include: (i) problem description: key tasks and challenges; (ii) core techniques: appearance modeling, localization strategies, and supervised classification methods; (iii) evaluation issues: approaches, metrics, standard datasets, and state-of-the-art results; and (iv) new development: particularly new approaches and applications motivated by the recent boom of social images. Finally, in retrospect of what has been achieved so far, the survey also discusses what the future may hold for object class detection research.

Skip Supplemental Material Section

Supplemental Material

References

  1. Aggarwal, J. K. and Ryoo, M. S. 2011. Human activity analysis: A review. ACM Comput. Surv. 43, 1--43. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Alexe, B., Deselaers, T., and Ferrari, V. 2010. What is an object? In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR'10).Google ScholarGoogle Scholar
  3. An, S. J., Peursum, P., Liu, W. Q., and Venkatesh, S. 2009. Efficient algorithms for subwindow search in object detection and localization. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR'09).Google ScholarGoogle Scholar
  4. Andriluka, M., Roth, S., and Schiele, B. 2009. Pictorial structures revisited: People detection and articulated pose estimation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR'09).Google ScholarGoogle Scholar
  5. Arbelaez, P., Maire, M., Fowlkes, C., and Malik, J. 2009. From contours to regions: An empirical evaluation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR'09).Google ScholarGoogle Scholar
  6. Atkins, C. B. 2008. Blocked recursive image composition. In Proceedings of the ACM International Conference on Multimedia (ACM/MM'08). Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Aytar, Y. and Zisserman, A. 2011. Tabula rasa: Model transfer for object category detection. In Proceedings of the IEEE International Conference on Computer Vision (ICCV'11). Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Bay, H., Ess, A., Tuytelaars, T., and Van Gool, L. 2008. Speeded-up robust features (surf). Comput Vis. Image Understand. 110, 346--359. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Bay, H., Tuytelaars, T., and Van Gool, L. 2006. SURF: Speeded up robust features. In Proceedings of the European Conference on Computer Vision (ECCV'06). Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Belongie, S., Malik, J., and Puzicha, J. 2001. Matching shapes. In Proceedings of the IEEE International Conference on Computer Vision (ICCV'01).Google ScholarGoogle Scholar
  11. Belongie, S., Malik, J., and Puzicha, J. 2002. Shape matching and object recognition using shape contexts. IEEE Trans. Pattern Anal. Mach. Intell. 24, 509--522. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Bentley, J. 1984. Programming pearls: Algorithm design techniques. Comm. ACM 27, 865--873. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Biederman, I., Mezzanotte, R., and Rabinowitz, J. 1982. Scene perception: Detecting and judging objects undergoing relational violations. Cogn. Psychol. 14, 143--177.Google ScholarGoogle ScholarCross RefCross Ref
  14. Blei, D. M., Ng, A. Y., and Jordan, M. I. 2003. Latent dirichlet allocation. J. Mach. Learn. Res. 3, 993--1022. Google ScholarGoogle ScholarCross RefCross Ref
  15. Boiman, O., Shechtman, E., and Irani, M. 2008. In defense of nearest-neighbor based image classification. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR'08).Google ScholarGoogle Scholar
  16. Borenstein, E., Sharon, E., and Ullman, S. 2004. Combining top-down and bottom-up segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshop (CVPRW'04). Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Borenstein, E. and Ullman, S. 2008. Combined top-down/bottom-up segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 30, 2109--2125. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Bosch, A., Zisserman, A., and Munoz, X. 2007a. Representing shape with a spatial pyramid kernel. In Proceedings of the ACM International Conference on Image and Video Retrieval (CIVR'07). Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Bosch, A., Zisserman, A., and Muoz, X. 2007b. Image classification using random forests and ferns. In Proceedings of the IEEE International Conference on Computer Vision (ICCV'07).Google ScholarGoogle Scholar
  20. Bouchard, G. and Triggs, B. 2005. Hierarchical part-based visual object categorization. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR'05). Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Boureau, Y. L., Bach, F., Lecun, Y., and Ponce, J. 2010. Learning mid-level features for recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR'10).Google ScholarGoogle Scholar
  22. Bray, M., Kohli, P., and Torr, P. 2006. PoseCut: Simultaneous segmentation and 3d pose estimation of humans using dynamic graph-cuts. In Proceedings of the European Conference on Computer Vision (ECCV'06). Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Cai, H. P., Yan, F., and Mikolajczyk, K. 2010. Learning weights for codebook in image classification and retrieval. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR'10).Google ScholarGoogle Scholar
  24. Cao, Y., Wang, C. H., Li, Z. W., Zhang, L. Q., and Zhang, L. 2010. Spatial-bag-of-features. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR'10).Google ScholarGoogle Scholar
  25. Carneiro, G. and Lowe, D. 2006. Sparse flexible models of local features. In Proceedings of the European Conference on Computer Vision (ECCV'06). Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Carreira, J., Li, F., and Sminchisescu, C. 2011. Object recognition by sequential figure-ground ranking. Int. J. Comput. Vis. 98, 3, 243--262. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Carreira, J. and Sminchisescu, C. 2010. Constrained parametric min-cuts for automatic object segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR'10).Google ScholarGoogle Scholar
  28. Chen, T., Cheng, M.-M., Tan, P., Shamir, A., and Hu, S.-M. 2009. Sketch2Photo: Internet image montage. In Proceedings of the ACM SIGGRAPH Asia Papers. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Chen, Y., Zhu, L. L., Li, C. L., Yuille, A., and Zhang, H. 2007. Rapid inference on a novel and/or graph for object detection, segmentation and parsing. In Proceedings of the Conference on Advances in Neural Information Processing Systems (NIPS'07).Google ScholarGoogle Scholar
  30. Chia, A. Y. S., Rahardja, S., Rajan, D., and Leung, M. K. H. 2009. Structural descriptors for category level object detection. IEEE Trans. Multimedia 11, 1407--1421. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Christoudias, C. M., Urtasun, R., and Darrell, T. 2008. Unsupervised feature selection via distributed coding for multi-view object recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR'08).Google ScholarGoogle Scholar
  32. Crandall, D., Felzenszwalb, P., and Huttenlocher, D. 2005. Spatial priors for part-based recognition using statistical models. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR'05). Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. Csurka, G., Dance, C., Fan, L., Willamowski, J., and Bray, C. 2004. Visual categorization with bags of keypoints. In Proceedings of the ECCV Workshop on Statistical Learning in Computer Vision (ECCVW'04).Google ScholarGoogle Scholar
  34. Csurka, G., Dance, C., Perronnin, F., and Willamowski, J. 2006. Generic visual categorization using weak geometry. In Toward Category-Level Object Recognition, J. Ponce, M. Hebert, C. Schmid, and A. Zisserman, Eds., Springer, 207--224.Google ScholarGoogle Scholar
  35. Dalal, N. 2006. Finding people in images and videos. Tech. rep., Institut National Polytechnique de Grenoble.Google ScholarGoogle Scholar
  36. Dalal, N. and Triggs, B. 2005. Histograms of oriented gradients for human detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR'05). Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. Dalal, N., Triggs, B., and Schmid, C. 2006. Human detection using oriented histograms of flow and appearance. In Proceedings of the European Conference on Computer Vision (ECCV'06). Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. Datta, R., Joshi, D., Li, J., and Wang, J. Z. 2008. Image retrieval: Ideas, influences, and trends of the new age. ACM Comput. Surv. 40, 1--60. Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. Deselaers, T. and Ferrari, V. 2010. Global and efficient self-similarity for object classification and detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR'10).Google ScholarGoogle Scholar
  40. Dickinson, S. 2009. The evolution of object categorization and the challenge of image abstraction. In Object Categorization: Computer and Human Vision Perspectives, A. L. S. Dickinson, B. Schiele, and M. Tarr, Eds., Cambridge University Press, 1--37.Google ScholarGoogle Scholar
  41. Divvala, S. K., Hoiem, D., Hays, J. H., Efros, A. A., and Hebert, M. 2009. An empirical study of context in object detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR'09).Google ScholarGoogle Scholar
  42. Dollar, P., Belongie, S., and Perona, P. 2010. The fastest pedestrian detector in the west. In Proceedings of the British Machine Vision Conference (BMVC'10). BMVA Press.Google ScholarGoogle Scholar
  43. Dollar, P., Tu, Z., Perona, P., and Belongie, S. 2009. Integral channel features. In Proceedings of the British Machine Vision Conference (BMVC'09).Google ScholarGoogle Scholar
  44. Dollar, P., Wojek, C., Schiele, B., and Perona, P. 2011. Pedestrian detection: An evaluation of the state of the art. IEEE Trans. Pattern Anal. Mach. Intell. 34, 4, 743--761. Google ScholarGoogle ScholarDigital LibraryDigital Library
  45. Endres, I. and Hoiem, D. 2010. Category independent object proposals. In Proceedings of the European Conference on Computer Vision (ECCV'10). Google ScholarGoogle ScholarDigital LibraryDigital Library
  46. Enzweiler, M. and Gavrila, D. M. 2008. A mixed generative-discriminative framework for pedestrian classification. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR'08).Google ScholarGoogle Scholar
  47. Everingham, M., Van Gool, L., Williams, C., Winn, J., and Zisserman, A. 2010. The pascal visual object classes (voc) challenge. Int. J. Comput. Vis. 88, 2, 303--338. Google ScholarGoogle ScholarDigital LibraryDigital Library
  48. Fan, J. P., Shen, Y., Zhou, N., and Gao, Y. L. 2010. Harvesting large-scale weakly-tagged image databases from the web. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR'10).Google ScholarGoogle Scholar
  49. Fei-Fei, L., Fergus, R., and Perona, P. 2004. Learning generative visual models from few training examples: An incremental bayesian approach tested on 101 object categories. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshop (CVPRW'04). Google ScholarGoogle ScholarDigital LibraryDigital Library
  50. Fei-Fei, L. and Perona, P. 2005. A Bayesian hierarchical model for learning natural scene categories. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR'04). Google ScholarGoogle ScholarDigital LibraryDigital Library
  51. Fei-Fei, L., Vanrullen, R., Koch, C., and Perona, P. 2002. Rapid natural scene categorization in the near absence of attention. Proc. Nat. Acad. Sci. 2, 9596--9601.Google ScholarGoogle Scholar
  52. Fei-Fei, L., Fergus, R., and Torralba, A. 2005. Recognizing and learning object categories. In International Conference on Computer Vision Short Course (ICCV'05). MIT.Google ScholarGoogle Scholar
  53. Fei-Fei, L., Fergus, R., and Torralba, A. 2007. Recognizing and learning object categories. In Computer Vision and Pattern Recognition Short Course (CVPR'07).Google ScholarGoogle Scholar
  54. Fei-Fei, L., Fergus, R., and Torralba, A. 2009. Recognizing and learning object categories. In International Conference on Computer Vision Short Course (ICCV'09).Google ScholarGoogle Scholar
  55. Felleman, D. J. and Van Essen, D. C. 1991. Distributed hierarchical processing in the primate cerebral cortex. Cerebral Cortex 1, 1--47.Google ScholarGoogle ScholarCross RefCross Ref
  56. Felzenszwalb, P., Mcallester, D., and Ramanan, D. 2008. A discriminatively trained, multiscale, deformable part model. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR'08).Google ScholarGoogle Scholar
  57. Felzenszwalb, P. F., Girshick, R. B., and Mcallester, D. 2010a. Cascade object detection with deformable part models. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR'10).Google ScholarGoogle Scholar
  58. Felzenszwalb, P. F., Girshick, R. B., Mcallester, D., and Ramanan, D. 2010b. Object detection with discriminatively trained part-based models. IEEE Trans. Pattern Anal. Mach. Intell. 32, 1627--1645. Google ScholarGoogle ScholarDigital LibraryDigital Library
  59. Felzenszwalb, P. F. and Huttenlocher, D. P. 2005. Pictorial structures for object recognition. Int. J. Comput. Vis. 61, 55--79. Google ScholarGoogle ScholarDigital LibraryDigital Library
  60. Felzenszwalb, P. F. and Veksler, O. 2010. Tiered scene labeling with dynamic programming. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR'10).Google ScholarGoogle Scholar
  61. Ferencz, A., Learned-Miller, E., and Malik, J. 2008. Learning to locate informative features for visual identification. Int. J. Comput. Vis. 77, 3--24. Google ScholarGoogle ScholarDigital LibraryDigital Library
  62. Fergus, R., Li, F.-F., Perona, P., and Zisserman, A. 2010. Learning object categories from internet image searches. Proc. IEEE. 98, 1453--1466.Google ScholarGoogle ScholarCross RefCross Ref
  63. Fergus, R., Perona, P., and Zisserman, A. 2003. Object class recognition by unsupervised scale-invariant learning. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR'03).Google ScholarGoogle Scholar
  64. Fergus, R., Perona, P., and Zisserman, A. 2005. A sparse object category model for efficient learning and exhaustive recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR'05). Google ScholarGoogle ScholarDigital LibraryDigital Library
  65. Fergus, R., Perona, P., and Zisserman, A. 2007. Weakly supervised scale-invariant learning of models for visual recognition. Int. J. Comput. Vis. 71, 273--303. Google ScholarGoogle ScholarDigital LibraryDigital Library
  66. Ferrari, V., Fevrier, L., Jurie, F., and Schmid, C. 2008. Groups of adjacent contour segments for object detection. IEEE Trans. Pattern Anal. Mach. Intell. 30, 36--51. Google ScholarGoogle ScholarDigital LibraryDigital Library
  67. Fischler, M. A. and Elschlager, R. A. 1973. The representation and matching of pictorial structures. IEEE Trans. Comput. C-22, 67--92. Google ScholarGoogle ScholarDigital LibraryDigital Library
  68. Fleuret, F. and Geman, D. 2001. Coarse-to-fine face detection. Int. J. Comput. Vis. 41, 85--107. Google ScholarGoogle ScholarDigital LibraryDigital Library
  69. Fulkerson, B., Vedaldi, A., and Soatto, S. 2009. Class segmentation and object localization with superpixel neighborhoods. In Proceedings of the IEEE International Conference on Computer Vision (ICCV'09).Google ScholarGoogle Scholar
  70. Gall, J. and Lempitsky, V. 2009. Class-specific hough forests for object detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR'09).Google ScholarGoogle Scholar
  71. Gallagher, A., Neustaedter, C., Cao, L., Luo, J., and Chen, T. 2008. Image annotation using personal calendars as context. In Proceedings of the ACM International Conference on Multimedia (ACM/MM'08). Google ScholarGoogle ScholarDigital LibraryDigital Library
  72. Gallagher, A. C. and Chen, T. 2008. Estimating age, gender, and identity using first name priors. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR'08).Google ScholarGoogle Scholar
  73. Galleguillos, C. and Belongie, S. 2010. Context based object categorization: A critical survey. Comput Vis. Image Understand. 114, 712--722. Google ScholarGoogle ScholarDigital LibraryDigital Library
  74. Galleguillos, C., Mcfee, B., Belongie, S., and Lanckriet, G. 2010. Multi-class object localization by combining local contextual interactions. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR'10).Google ScholarGoogle Scholar
  75. Gehler, P. and Nowozin, S. 2009. On feature combination for multiclass object classification. In Proceedings of the IEEE International Conference on Computer Vision (ICCV'09).Google ScholarGoogle Scholar
  76. Girshick, R. B., Felzenszwalb, P. F., and Mcallester, D. 2011. Object detection with grammar models. In Proceedings of the Conference on Advances in Neural Information Processing Systems (NIPS'11).Google ScholarGoogle Scholar
  77. Gonfaus, J. M., Boix, X., Van de Weijer, J., Bagdanov, A. D., Serrat, J., and Gonzalez, J. 2010. Harmony potentials for joint classification and segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR'10).Google ScholarGoogle Scholar
  78. Gould, S., Fulton, R., and Koller, D. 2009a. Decomposing a scene into geometric and semantically consistent regions. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR'09).Google ScholarGoogle Scholar
  79. Gould, S., Gao, T. S., and Koller, D. 2009b. Region-based segmentation and object detection. In Proceedings of the Conference on Advances in Neural Information Processing Systems (NIPS'09).Google ScholarGoogle Scholar
  80. Grabner, H., Roth, P. M., and Bischof, H. 2007. Eigenboosting: Combining discriminative and generative information. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR'07).Google ScholarGoogle Scholar
  81. Grauman, K. and Leibe, B. 2011. Visual object recognition. Synthesis Lectures Artif. Intell. Mach. Learn. 5, 1--181.Google ScholarGoogle ScholarCross RefCross Ref
  82. Griffin, G., Holub, A., and Perona, P. 2007. Caltech-256 object category dataset. Tech. rep., California Institute of Technology, 1-20. http://authors.library.caltech.edu/7694/1/CNS-TR-2007-001.pdf.Google ScholarGoogle Scholar
  83. Gu, C. H., Lim, J. J., Arbelaez, P., and Malik, J. 2009. Recognition using regions. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR'09).Google ScholarGoogle Scholar
  84. Guillaumin, M., Verbeek, J., and Schmid, C. 2010. Multimodal semi-supervised learning for image classification. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR'10).Google ScholarGoogle Scholar
  85. Hays, J. and Efros, A. A. 2008. IM2GPS: Estimating geographic information from a single image. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR'08).Google ScholarGoogle Scholar
  86. He, X. M., Zemel, R., and Ray, D. 2006. Learning and incorporating top-down cues in image segmentation. In Proceedings of the European Conference on Computer Vision (ECCV'06). Google ScholarGoogle ScholarDigital LibraryDigital Library
  87. He, X. M., Zemel, R. S., and Carreira-Perpinan, M. A. 2004. Multiscale conditional random fields for image labeling. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR'04). Google ScholarGoogle ScholarDigital LibraryDigital Library
  88. Heitz, G., Elidan, G., Packer, B., and Koller, D. 2009. Shape-based object localization for descriptive classification. Int. J. Comput. Vis. 84, 40--62. Google ScholarGoogle ScholarDigital LibraryDigital Library
  89. Hochstein, S. and Ahissar, M. 2002. View from the top: Hierarchies and reverse hierarchies in the visual system. Neuron 36, 791--804.Google ScholarGoogle ScholarCross RefCross Ref
  90. Hofmann, T. 2001. Unsupervised learning by probabilistic latent semantic analysis. Mach. Learn. 42, 177--196. Google ScholarGoogle ScholarDigital LibraryDigital Library
  91. Hoiem, D., Efros, A., and Hebert, M. 2008. Putting objects in perspective. Int. J. Comput. Vis. 80, 3--15. Google ScholarGoogle ScholarDigital LibraryDigital Library
  92. Hoiem, D., Rother, C., and Winn, J. 2007a. 3D layoutcrf for multi-view object class recognition and segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR'07).Google ScholarGoogle Scholar
  93. Hoiem, D., Stein, A., Efros, A., and Hebert, M. 2007b. Recovering occlusion boundaries from a single image. In Proceedings of the IEEE International Conference on Computer Vision (ICCV'07).Google ScholarGoogle Scholar
  94. Huang, Y. Z., Huang, K. Q., Wang, L. S., Tao, D. C., Tan, T. N., and Li, X. L. 2008. Enhanced biologically inspired model. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR'08).Google ScholarGoogle Scholar
  95. Hwang, S. J. and Grauman, K. 2010. Reading between the lines: Object localization using implicit cues from image tags. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR'10).Google ScholarGoogle Scholar
  96. Jain, A., Gupta, A., and Davis, L. 2010. Learning what and how of contextual models for scene labeling. In Proceedings of the European Conference on Computer Vision (ECCV'10). Google ScholarGoogle ScholarDigital LibraryDigital Library
  97. Jhuang, H., Serre, T., Wolf, L., and Poggio, T. 2007. A biologically inspired system for action recognition. In Proceedings of the IEEE International Conference on Computer Vision (ICCV'07).Google ScholarGoogle Scholar
  98. Ji, R. R., Yao, H. X., Sun, X. S., Zhong, B. N., and Gao, W. 2010. Towards semantic embedding in visual vocabulary. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR'10).Google ScholarGoogle Scholar
  99. Jiang, Y.-G., Yang, J., Ngo, C.-W., and Hauptmann, A. G. 2010. Representations of keypoint-based semantic concept detection: A comprehensive study. IEEE Trans. Multimedia 12, 42--53. Google ScholarGoogle ScholarDigital LibraryDigital Library
  100. Joachims, T. 1997. A probabilistic analysis of the rocchio algorithm with tfidf for text categorization. In Proceedings of the International Conference on Machine Learning (ICML'97). Google ScholarGoogle ScholarDigital LibraryDigital Library
  101. Joachims, T. 1998. Making large-scale support vector machine learning practical. In Advances in Kernel Methods: Support Vector Machines, B. Scholkopf, J. C. Burges, and A. J. Smola, Eds. MIT Press, Cambridge, MA, 169--184. Google ScholarGoogle ScholarDigital LibraryDigital Library
  102. Jones, J. P. and Palmer, L. A. 1987. An evaluation of the two-dimensional gabor filter model of simple receptive fields in cat striate cortex. J. Neurophys. 58, 1233--1258.Google ScholarGoogle ScholarCross RefCross Ref
  103. Jurie, F. and Triggs, B. 2005. Creating efficient codebooks for visual recognition. In Proceedings of the IEEE International Conference on Computer Vision (ICCV'05). Google ScholarGoogle ScholarDigital LibraryDigital Library
  104. Karlinsky, L., Dinerstein, M., Harari, D., and Ullman, S. 2010. The chains model for detecting parts by their context. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR'10).Google ScholarGoogle Scholar
  105. Ke, Y. and Sukthankar, R. 2004. PCA-sift: A more distinctive representation for local image descriptors. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR'04). Google ScholarGoogle ScholarDigital LibraryDigital Library
  106. Knopp, J., Prasad, M., and Gool, L. V. 2011. Scene cut: Class-specific object detection and segmentation in 3d scenes. In Proceedings of the International Conference on 3D Imaging, Modeling, Processing, Visualization and Transmission (3DIMPVT'11). Google ScholarGoogle ScholarDigital LibraryDigital Library
  107. Koh, K., Kim, S.-J., and Boyd, S. 2007. An interior-point method for large-scale l1-regularized logistic regression. J. Mach. Learn. Res. 8, 1519--1555. Google ScholarGoogle ScholarDigital LibraryDigital Library
  108. Kohli, P., Ladicky, L., and Torr, P. 2008. Robust higher order potentials for enforcing label consistency. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR'08).Google ScholarGoogle Scholar
  109. Kotsiantis, S. B. 2007. Supervised machine learning: A review of classification techniques. Informatica 31, 249--268.Google ScholarGoogle Scholar
  110. Krüger, V., Kragic, D., Ude, A., and Geib, C. 2007. The meaning of action: A review on action recognition and mapping. Advan. Robot. 21, 1473--1501.Google ScholarGoogle ScholarCross RefCross Ref
  111. Kuettel, D. and Ferrari, V. 2012. Figure-ground segmentation by transferring window masks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR'12). Google ScholarGoogle ScholarDigital LibraryDigital Library
  112. Kumar, M. P., Ton, P. H. S., and Zisserman, A. 2005. OBJCUT. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR'05).Google ScholarGoogle Scholar
  113. Kumar, M. P., Torr, P. H. S., and Zisserman, A. 2010. OBJCUT: Efficient segmentation using top-down and bottom-up cues. IEEE Trans. Pattern Anal. Mach. Intell. 32, 530--545. Google ScholarGoogle ScholarDigital LibraryDigital Library
  114. Kumar, N., Belhumeur, P., and Nayar, S. 2008. FaceTracer: A search engine for large collections of images with faces. In Proceedings of the European Conference on Computer Vision (ECCV'08). Google ScholarGoogle ScholarDigital LibraryDigital Library
  115. Ladicky, L., Sturgess, P., Alahari, K., Russell, C., and Torr, P. 2010. What, where and how many? Combining object detectors and crfs. In Proceedings of the European Conference on Computer Vision (ECCV'10). Google ScholarGoogle ScholarDigital LibraryDigital Library
  116. Lalonde, J.-F., Hoiem, D., Efros, A. A., Rother, C., Winn, J., and Criminisi, A. 2007. Photo clip art. In Proceedings of the International Conference and Exhibition on Computer Graphics and Interactive Techniques (ACM/SIGGRAPH'07). Google ScholarGoogle ScholarDigital LibraryDigital Library
  117. Lalonde, J.-F., Narasimhan, S. G., and Efros, A. A. 2010. What do the sun and the sky tell us about the camera? Int. J. Comput. Vis. 88, 24--51. Google ScholarGoogle ScholarDigital LibraryDigital Library
  118. Lampert, C. H., Blaschko, M. B., and Hofmann, T. 2008. Beyond sliding windows: Object localization by efficient sub-window search. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR'08).Google ScholarGoogle Scholar
  119. Laptev, I. 2006. Improvements of object detection using boosted histograms. In Proceedings of the British Machine Vision Conference (BMVC'06).Google ScholarGoogle ScholarCross RefCross Ref
  120. Laptev, I. 2009. Improving object detection with boosted histograms. Image Vis. Comput. 27, 535--544. Google ScholarGoogle ScholarDigital LibraryDigital Library
  121. Larlus, D. and Jurie, F. 2008. Combining appearance models and markov random fields for category level object segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR'08).Google ScholarGoogle Scholar
  122. Larlus, D., Verbeek, J., and Jurie, F. 2010. Category level object segmentation by combining bag-of-words models with dirichlet processes and random fields. Int. J. Comput. Vision 88, 238--253. Google ScholarGoogle ScholarDigital LibraryDigital Library
  123. Lasserre, J. A., Bishop, C. M., and Minka, T. P. 2006. Principled hybrids of generative and discriminative models. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR'06). Google ScholarGoogle ScholarDigital LibraryDigital Library
  124. Lazebnik, S., Schmid, C., and Ponce, J. 2006. Beyond bags of features: Spatial pyramid matching for recognizing natural scene categories. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR'06). Google ScholarGoogle ScholarDigital LibraryDigital Library
  125. Lee, H., Battle, A., Raina, R., and Ng, A. Y. 2006. Efficient sparse coding algorithms. Adv. Neural Inf. Process. Syst. 19, 2007.Google ScholarGoogle Scholar
  126. Lee, Y. J. and Grauman, K. 2010. Object-graphs for context-aware category discovery. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR'10).Google ScholarGoogle Scholar
  127. Leibe, B., Leonardis, A., and Schiele, B. 2004. Combined object categorization and segmentation with an implicit shape model. In Proceedings of the ECCV Workshop on Statistical Learning in Computer Vision (ECCVW'04).Google ScholarGoogle Scholar
  128. Leibe, B., Leonardis, A., and Schiele, B. 2006. An implicit shape model for combined object categorization and segmentation. In Toward Category-Level Object Recognition, J. Ponce, M. Hebert, C. Schmid, and A. Zisserman, Eds., Springer, 508--524.Google ScholarGoogle Scholar
  129. Leibe, B., Leonardis, A., and Schiele, B. 2008. Robust object detection with interleaved categorization and segmentation. Int. J. Comput. Vis. 77, 259--289. Google ScholarGoogle ScholarDigital LibraryDigital Library
  130. Leibe, B., Seemann, E., and Schiele, B. 2005. Pedestrian detection in crowded scenes. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR'05). Google ScholarGoogle ScholarDigital LibraryDigital Library
  131. Lempitsky, V., Kohli, P., Rother, C., and Sharp, T. 2009. Image segmentation with a bounding box prior. In Proceedings of the IEEE International Conference on Computer Vision (ICCV'09).Google ScholarGoogle Scholar
  132. Levin, A. and Weiss, Y. 2006. Learning to combine bottom-up and top-down segmentation. In Proceedings of the European Conference on Computer Vision (ECCV'06). Google ScholarGoogle ScholarDigital LibraryDigital Library
  133. Li, L.-J. and Fei-Fei, L. 2007. What, where and who? Classifying events by scene and object recognition. In Proceedings of the IEEE International Conference on Computer Vision (ICCV'07).Google ScholarGoogle Scholar
  134. Li, L.-J. and Fei-Fei, L. 2010. OPTIMOL: Automatic online picture collection via incremental model learning. Int. J. Comput. Vis. 88, 147--168. Google ScholarGoogle ScholarDigital LibraryDigital Library
  135. Liang, P. and Jordan, M. I. 2008. An asymptotic analysis of generative, discriminative, and pseudolikelihood estimators. In Proceedings of the International Conference on Machine Learning (ICML'08). Google ScholarGoogle ScholarDigital LibraryDigital Library
  136. Liebelt, J. and Schmid, C. 2010. Multi-view object class detection with a 3d geometric model. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR'10).Google ScholarGoogle Scholar
  137. Liebelt, J., Schmid, C., and Schertler, K. 2008. Viewpoint-independent object class detection using 3d feature maps. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR'08).Google ScholarGoogle Scholar
  138. Lin, D., Kapoor, A., Hua, G., and Baker, S. 2010. Joint people, event, and location recognition in personal photo collections using cross-domain context. In Proceedings of the European Conference on Computer Vision (ECCV'10). Google ScholarGoogle ScholarDigital LibraryDigital Library
  139. Lin, Z. 2009. Modeling shape, appearance and motion for human movement analysis. Tech. rep., Department of Electrical and Computer Engineering, University of Maryland, College Park, Md. http://hdl.handle.net/1903/9279.Google ScholarGoogle Scholar
  140. Lin, Z. and Davis, L. S. 2010. Shape-based human detection and segmentation via hierarchical part-template matching. IEEE Trans. Pattern Anal. Mach. Intell. 32, 604--618. Google ScholarGoogle ScholarDigital LibraryDigital Library
  141. Lin, Z., Davis, L. S., Doermann, D., and Dementhon, D. 2007. Hierarchical part-template matching for human detection and segmentation. In Proceedings of the IEEE International Conference on Computer Vision (ICCV'07).Google ScholarGoogle Scholar
  142. Liu, C., Yuen, J., and Torralba, A. 2009a. Nonparametric scene parsing: Label transfer via dense scene alignment. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR'09).Google ScholarGoogle Scholar
  143. Liu, C., Yuen, J., Torralba, A., Sivic, J., and Freeman, W. 2008. SIFT flow: Dense correspondence across different scenes. In Proceedings of the European Conference on Computer Vision (ECCV'08). Google ScholarGoogle ScholarDigital LibraryDigital Library
  144. Liu, T., Wang, J. D., Sun, J., Zheng, N. N., Tang, X. O., and Shum, H. Y. 2009b. Picture collage. IEEE Trans. Multimedia 11, 1225--1239. Google ScholarGoogle ScholarDigital LibraryDigital Library
  145. Lowe, D. G. 2004. Distinctive image features from scale-invariant keypoints. Int. J. Comput. Vision 60, 91--110. Google ScholarGoogle ScholarDigital LibraryDigital Library
  146. Lu, Z. W. and Ip, H. H. S. 2009. Image categorization with spatial mismatch kernels. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR'09).Google ScholarGoogle Scholar
  147. Luo, J., Boutell, M., and Brown, C. 2006. Pictures are not taken in a vacuum. IEEE Signal Process. Mag. 23, 101--114.Google ScholarGoogle ScholarCross RefCross Ref
  148. Maire, M., Yu, S. X., and Perona, P. 2011. Object detection and segmentation from joint embedding of parts and pixels. In Proceedings of the IEEE International Conference on Computer Vision (ICCV'11). Google ScholarGoogle ScholarDigital LibraryDigital Library
  149. Maji, S., Berg, A. C., and Malik, J. 2008. Classification using intersection kernel support vector machines is efficient. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR'08).Google ScholarGoogle Scholar
  150. Malisiewicz, T. and Efros, A. A. 2008. Recognition by association via learning per-exemplar distances. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR'08).Google ScholarGoogle Scholar
  151. Marszałek, M. and Schmid, C. 2007. Accurate object localization with shape masks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR'07).Google ScholarGoogle Scholar
  152. Mikolajczyk, K. and Schmid, C. 2005. A performance evaluation of local descriptors. IEEE Trans. Pattern Anal. Mach. Intell. 27, 1615--1630. Google ScholarGoogle ScholarDigital LibraryDigital Library
  153. Moosmann, F., Nowak, E., and Jurie, F. 2008. Randomized clustering forests for image classification. IEEE Trans. Pattern Anal. Mach. Intell. 30, 1632--1646. Google ScholarGoogle ScholarDigital LibraryDigital Library
  154. Mu, Y., Yan, S., Liu, Y., Huang, T., and Zhou, B. 2008. Discriminative local binary patterns for human detection in personal album. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR'08).Google ScholarGoogle Scholar
  155. Mutch, J. and Lowe, D. 2008. Object class recognition and localization using sparse features with limited receptive fields. Int. J. Comput. Vis. 80, 45--57. Google ScholarGoogle ScholarDigital LibraryDigital Library
  156. Mutch, J. and Lowe, D. G. 2006. Multiclass object recognition with sparse, localized features. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR'06). Google ScholarGoogle ScholarDigital LibraryDigital Library
  157. Nakayama, H., Harada, T., and Kuniyoshi, Y. 2010. Global gaussian approach for scene categorization using information geometry. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR'10).Google ScholarGoogle Scholar
  158. Narasimhan, S. and Nayar, S. 2002. Vision and the atmosphere. Int. J. Comput. Vis. 48, 233--254. Google ScholarGoogle ScholarDigital LibraryDigital Library
  159. Ng, A. and Jordan, M. 2002. On discriminative vs. generative classifiers: A comparison of logistic regression and naive bayes. In Proceedings of the Conference on Advances in Neural Information Processing Systems (NIPS'02).Google ScholarGoogle Scholar
  160. Ni, B. B., Yan, S. C., and Kassim, A. 2009. Contextualizing histogram. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR'09).Google ScholarGoogle Scholar
  161. Nister, D. and Stewenius, H. 2006. Scalable recognition with a vocabulary tree. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR'06). Google ScholarGoogle ScholarDigital LibraryDigital Library
  162. Nowak, E., Jurie, F., and Triggs, B. 2006. Sampling strategies for bag-of-features image classification. In Proceedings of the European Conference on Computer Vision (ECCV'06). Google ScholarGoogle ScholarDigital LibraryDigital Library
  163. Ojala, T., Pietikainen, M., and Maenpaa, T. 2002. Multi-resolution gray-scale and rotation invariant texture classification with local binary patterns. IEEE Trans. Pattern Anal. Mach. Intell. 24, 971--987. Google ScholarGoogle ScholarDigital LibraryDigital Library
  164. Oliva, A. and Torralba, A. 2001. Modeling the shape of the scene: A holistic representation of the spatial envelope. Int. J. Comput. Vis. 42, 145--175. Google ScholarGoogle ScholarDigital LibraryDigital Library
  165. Oliva, A. and Torralba, A. 2006. Building the gist of a scene: The role of global image features in recognition. Progress Brain Res. 155, 23--36.Google ScholarGoogle ScholarCross RefCross Ref
  166. Oliva, A. and Torralba, A. 2007. The role of context in object recognition. Trends Cogn. Sci. 11, 520--527.Google ScholarGoogle ScholarCross RefCross Ref
  167. Opelt, A., Pinz, A., and Zisserman, A. 2006. A boundary-fragment-model for object detection. In Proceedings of the European Conference on Computer Vision (ECCV'06). Google ScholarGoogle ScholarDigital LibraryDigital Library
  168. Palmese, M. and Trucco, A. 2008. From 3-d sonar images to augmented reality models for objects buried on the seafloor. IEEE Trans. Instrument. Measure. 57, 820--828.Google ScholarGoogle ScholarCross RefCross Ref
  169. Parikh, D. and Zitnick, C. L. 2010. The role of features, algorithms and data in visual recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR'10).Google ScholarGoogle Scholar
  170. Park, D., Ramanan, D., and Fowlkes, C. 2010. Multiresolution models for object detection. In Proceedings of the European Conference on Computer Vision (ECCV'10). Google ScholarGoogle ScholarDigital LibraryDigital Library
  171. Pedersoli, M., Vedaldi, A., and Gonzalez, J. 2011. A coarse-to-fine approach for fast deformable object detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR'11). Google ScholarGoogle ScholarDigital LibraryDigital Library
  172. Perronnin, F. 2008. Universal and adapted vocabularies for generic visual categorization. IEEE Trans. Pattern Anal. Mach. Intell. 30, 1243--1256. Google ScholarGoogle ScholarDigital LibraryDigital Library
  173. Perrotton, X., Sturzel, M., and Roux, M. 2010. Implicit hierarchical boosting for multi-view object detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR'10).Google ScholarGoogle Scholar
  174. Pinz, A. 2005. Object categorization. Foundat. Trends Comput. Graph. Vis. 1, 4, 255--353. Google ScholarGoogle ScholarDigital LibraryDigital Library
  175. Ponce, J., Berg, T. L., Everingham, M., Forsyth, D. A., Hebert, M., Lazebnik, S., Marszałek, M., Schmid, C., Russell, B. C., Torralba, A., Williams, C. K. I., Zhang, J., and Zisserman, A. 2006a. Dataset issues in object recognition. In Toward Category-Level Object Recognition, J. Ponce, M. Hebert, C. Schmid, and A. Zisserman Eds., Springer, 29--48.Google ScholarGoogle Scholar
  176. Ponce, J., Hebert, M., Schmid, C., and Zisserman, A. 2006b. Toward Category-Level Object Recognition. Lecture Notes in Computer Science, vol. 4170, Springer. Google ScholarGoogle ScholarDigital LibraryDigital Library
  177. Porikli, F. 2005. Integral histogram: A fast way to extract histograms in cartesian spaces. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR'05). Google ScholarGoogle ScholarDigital LibraryDigital Library
  178. Rabinovich, A., Vedaldi, A., Galleguillos, C., Wiewiora, E., and Belongie, S. 2007. Objects in context. In Proceedings of the IEEE International Conference on Computer Vision (ICCV'07).Google ScholarGoogle Scholar
  179. Ravishankar, S., Jain, A., and Mittal, A. 2008. Multi-stage contour based detection of deformable objects. In Proceedings of the European Conference on Computer Vision (ECCV'08). Google ScholarGoogle ScholarDigital LibraryDigital Library
  180. Razavi, N., Gall, J., and Van Gool, L. 2010. Backprojection revisited: Scalable multi-view object detection and similarity metrics for detections. In Proceedings of the European Conference on Computer Vision (ECCV'10). Google ScholarGoogle ScholarDigital LibraryDigital Library
  181. Ren, X. and Malik, J. 2003. Learning a classification model for segmentation. In Proceedings of the IEEE International Conference on Computer Vision (ICCV'03). Google ScholarGoogle ScholarDigital LibraryDigital Library
  182. Riesenhuber, M. and Poggio, T. 1999. Hierarchical models of object recognition in cortex. Nature Neurosci. 2, 1019--1025.Google ScholarGoogle ScholarCross RefCross Ref
  183. Rother, C., Bordeaux, L., Hamadi, Y., and Blake, A. 2006. AutoCollage. ACM Trans. Graph. 25, 3, 847--852. Google ScholarGoogle ScholarDigital LibraryDigital Library
  184. Rother, C., Kolmogorov, V., and Blake, A. 2004. “GrabCut”: Interactive foreground extraction using iterated graph cuts. ACM Trans. Graph. 23, 3, 309--314. Google ScholarGoogle ScholarDigital LibraryDigital Library
  185. Rubinstein, D. and Hastie, T. 1997. Discriminative vs informative learning. In Proceedings of the International Conference on Knowledge Discovery and Data Mining (KDDM'97).Google ScholarGoogle Scholar
  186. Rubner, Y., Tomasi, C., and Guibas, L. J. 2000. The earth mover's distance as a metric for image retrieval. Int. J. Comput. Vis. 40, 99--121. Google ScholarGoogle ScholarDigital LibraryDigital Library
  187. Rui, X., Li, M., Li, Z., Ma, W.-Y., and Yu, N. 2007. Bipartite graph reinforcement model for web image annotation. In Proceedings of the ACM International Conference on Multimedia (ACM/MM'07). Google ScholarGoogle ScholarDigital LibraryDigital Library
  188. Russell, B., Torralba, A., Liu, C., Fergus, R., and Freeman, W. 2007. Object recognition by scene alignment. In Proceedings of the Conference on Advances in Neural Information Processing Systems (NIPS'07).Google ScholarGoogle Scholar
  189. Russell, B., Torralba, A., Murphy, K., and Freeman, W. 2008. LabelMe: A database and web-based tool for image annotation. Int. J. Comput. Vis. 77, 157--173. Google ScholarGoogle ScholarDigital LibraryDigital Library
  190. Sabzmeydani, P. and Mori, G. 2007. Detecting pedestrians by learning shapelet features. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR'07).Google ScholarGoogle Scholar
  191. Saffari, A., Godec, M., Pock, T., Leistner, C., and Bischof, H. 2010. Online multi-class lpboost. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR'10).Google ScholarGoogle Scholar
  192. Salakhutdinov, R., Torralba, A., and Tenenbaum, J. 2011. Learning to share visual appearance for multiclass object detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR'11). Google ScholarGoogle ScholarDigital LibraryDigital Library
  193. Salzmann, M. and Urtasun, R. 2010. Combining discriminative and generative methods for 3d deformable surface and articulated pose reconstruction. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR'10).Google ScholarGoogle Scholar
  194. Savarese, S. and Li, F.-F. 2007. 3D generic object categorization, localization and pose estimation. In Proceedings of the IEEE International Conference on Computer Vision (ICCV'07).Google ScholarGoogle Scholar
  195. Savarese, S., Winn, J., and Criminisi, A. 2006. Discriminative object class models of appearance and shape by correlatons. In Preceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR'06). Google ScholarGoogle ScholarDigital LibraryDigital Library
  196. Schindler, K., Van Gool, L., and De Gelder, B. 2008. Recognizing emotions expressed by body pose: A biologically inspired neural model. Neural Netw. 21, 1238--1246. Google ScholarGoogle ScholarDigital LibraryDigital Library
  197. Schroff, F. 2009. Semantic image segmentation and web-supervised visual learning. Tech. rep., Robotics Research Group, Department of Engineering Science. University of Oxford, Oxford, UK. http://www.robots.ox.ac.uk/∼vgg/publications/papers/schroff09.pdf.Google ScholarGoogle Scholar
  198. Seemann, E., Leibe, B., and Schiele, B. 2006. Multi-aspect detection of articulated objects. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR'06). Google ScholarGoogle ScholarDigital LibraryDigital Library
  199. Serre, T., Oliva, A., and Poggio, T. 2007a. A feed-forward architecture accounts for rapid categorization. Proc. National Acad. Sci. 104, 6424--6429.Google ScholarGoogle ScholarCross RefCross Ref
  200. Serre, T., Wolf, L., Bileschi, S., Riesenhuber, M., and Poggio, T. 2007b. Robust object recognition with cortex-like mechanisms. IEEE Trans. Pattern Anal. Mach. Intell. 29, 411--426. Google ScholarGoogle ScholarDigital LibraryDigital Library
  201. Serre, T., Wolf, L., and Poggio, T. 2005. Object recognition with features inspired by visual cortex. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR'05). Google ScholarGoogle ScholarDigital LibraryDigital Library
  202. Shechtman, E. and Irani, M. 2007. Matching local self-similarities across images and videos. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR'07).Google ScholarGoogle Scholar
  203. Shin, Y., Kim, Y., and Kim, E. Y. 2010. Automatic textile image annotation by predicting emotional concepts from visual features. Image Vis. Comput. 28, 526--537. Google ScholarGoogle ScholarDigital LibraryDigital Library
  204. Shotton, J., Blake, A., and Cipolla, R. 2005. Contour-based learning for object detection. In Proceedings of the IEEE International Conference on Computer Vision (ICCV'05). Google ScholarGoogle ScholarDigital LibraryDigital Library
  205. Shotton, J., Blake, A., and Cipolla, R. 2008a. Multiscale categorical object recognition using contour fragments. IEEE Trans. Pattern Anal. Mach. Intell. 30, 1270--1281. Google ScholarGoogle ScholarDigital LibraryDigital Library
  206. Shotton, J., Johnson, M., and Cipolla, R. 2008b. Semantic texton forests for image categorization and segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR'08).Google ScholarGoogle Scholar
  207. Shotton, J., Winn, J., Rother, C., and Criminisi, A. 2006. TextonBoost: Joint appearance, shape and context modeling for multi-class object recognition and segmentation. In Proceedings of the European Conference on Computer Vision (ECCV'06). Google ScholarGoogle ScholarDigital LibraryDigital Library
  208. Shotton, J., Winn, J., Rother, C., and Criminisi, A. 2009. TextonBoost for image understanding: Multi-class object recognition and segmentation by jointly modeling texture, layout, and context. Int. J. Comput. Vis. 81, 2--23. Google ScholarGoogle ScholarDigital LibraryDigital Library
  209. Simon, I. and Seitz, S. 2008. Scene segmentation using the wisdom of crowds. In Proceedings of the European Conference on Computer Vision (ECCV'08). Google ScholarGoogle ScholarDigital LibraryDigital Library
  210. Sivic, J., Russell, B. C., Efros, A. A., Zisserman, A., and Freeman, W. T. 2005. Discovering objects and their location in images. In Proceedings of the IEEE International Conference on Computer Vision (ICCV'05). Google ScholarGoogle ScholarDigital LibraryDigital Library
  211. Sivic, J. and Zisserman, A. 2003. Video google: Text retrieval approach to object matching in videos. In Proceedings of the IEEE International Conference on Computer Vision (ICCV'03). Google ScholarGoogle ScholarDigital LibraryDigital Library
  212. Snavely, N., Simon, I., Goesele, M., Szeliski, R., and Seitz, S. M. 2010. Scene reconstruction and visualization from community photo collections. Proc. IEEE. 98, 1370--1390.Google ScholarGoogle ScholarCross RefCross Ref
  213. Song, D. J. and Tao, D. C. 2010. Biologically inspired feature manifold for scene classification. IEEE Trans. Image Process. 19, 174--184. Google ScholarGoogle ScholarDigital LibraryDigital Library
  214. Song, Z., Chen, Q., Huang, Z., Hua, Y., and Yan, S. 2011. Contextualizing object detection and classification. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR'11). Google ScholarGoogle ScholarDigital LibraryDigital Library
  215. Sonnenburg, S., Rutsch, G., Schafer, C., and Scholkopf, B. 2006. Large scale multiple kernel learning. J. Mach. Learn. Res. 7, 1531--1565. Google ScholarGoogle ScholarDigital LibraryDigital Library
  216. Strat, T. 1993. Employing contextual information in computer vision. In Proceedings of the ARPA Image Understanding Workshop. 217--229.Google ScholarGoogle Scholar
  217. Sutton, C. and McCallum, A. 2006. An introduction to conditional random fields for relational learning. In Introduction to Statistical Relational Learning, L. Getoor and B. Taskar, Eds., MIT Press. http://people.cs.umass.edu/∼mccallum/papers/crf-tutorial.pdf.Google ScholarGoogle Scholar
  218. Szeliski, R. 2010. Computer Vision: Algorithms and Applications. Springer. Google ScholarGoogle Scholar
  219. Tao, L., Yuan, L., and Sun, J. 2009. SkyFinder: Attribute-based sky image search. In ACM SIGGRAPH Papers. Google ScholarGoogle ScholarDigital LibraryDigital Library
  220. Thomas, A., Ferrari, V., Leibe, B., Tuytelaars, T., Schiele, B., and Van Gool, L. 2006. Towards multi-view object class detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR'06). Google ScholarGoogle ScholarDigital LibraryDigital Library
  221. Torralba, A. 2003. Contextual priming for object detection. Int. J. Comput. Vis. 53, 169--191. Google ScholarGoogle ScholarDigital LibraryDigital Library
  222. Torralba, A., Fergus, R., and Freeman, W. T. 2008. 80 million tiny images: A large data set for nonparametric object and scene recognition. IEEE Trans. Pattern Anal. Mach. Intell. 30, 1958--1970. Google ScholarGoogle ScholarDigital LibraryDigital Library
  223. Torralba, A., Murphy, K., and Freeman, W. 2006. Shared features for multiclass object detection. In Toward Category-Level Object Recognition, J. Ponce, M. Hebert, C. Schmid, and A. Zisserman, Eds., Springer, 345--361.Google ScholarGoogle Scholar
  224. Torralba, A., Murphy, K. P., and Freeman, W. T. 2004. Sharing features: Efficient boosting procedures for multiclass object detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR'04). Google ScholarGoogle ScholarDigital LibraryDigital Library
  225. Torralba, A., Murphy, K. P., Freeman, W. T., and Rubin, M. A. 2003. Context-based vision system for place and object recognition. In Proceedings of the IEEE International Conference on Computer Vision (ICCV'03). Google ScholarGoogle ScholarDigital LibraryDigital Library
  226. Tu, Z. W. 2007. Learning generative models via discriminative approaches. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR'07).Google ScholarGoogle ScholarCross RefCross Ref
  227. Ulusoy, I. and Bishop, C. 2006. Comparison of generative and discriminative techniques for object detection and classification. In Toward Category-Level Object Recognition, J. Ponce, M. Hebert, C. Schmid, and A. Zisserman, Eds., Springer, 173--195.Google ScholarGoogle Scholar
  228. Ulusoy, I. and Bishop, C. M. 2005. Generative versus discriminative methods for object recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR'05). Google ScholarGoogle ScholarDigital LibraryDigital Library
  229. Van De Sande, K., Gevers, T., and Snoek, C. 2008. Evaluation of color descriptors for object and scene recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR'08).Google ScholarGoogle Scholar
  230. Van De Sande, K., Gevers, T., and Snoek, C. 2010. Evaluating color descriptors for object and scene recognition. IEEE Trans. Pattern Anal. Mach. Intell. 32, 1582--1596. Google ScholarGoogle ScholarDigital LibraryDigital Library
  231. Van De Sande, K., Uijlings, J., Gevers, T., and Smeulders, A. 2011. Segmentation as selective search for object recognition. In Proceedings of the IEEE International Conference on Computer Vision (ICCV'11). Google ScholarGoogle ScholarDigital LibraryDigital Library
  232. Van Gemert, J. C., Veenman, C. J., Smeulders, A. W. M., and Geusebroek, J. M. 2010. Visual word ambiguity. IEEE Trans. Pattern Anal. Mach. Intell. 32, 1271--1283. Google ScholarGoogle ScholarDigital LibraryDigital Library
  233. Vapnik, V. N. 1998. Statistical Learning Theory. A Wiley-Interscience Publication, New York.Google ScholarGoogle Scholar
  234. Varma, M. and Ray, D. 2007. Learning the discriminative power-invariance trade-off. In Proceedings of the IEEE International Conference on Computer Vision (ICCV'07).Google ScholarGoogle Scholar
  235. Vedaldi, A., Gulshan, V., Varma, M., and Zisserman, A. 2009. Multiple kernels for object detection. In Proceedings of the IEEE International Conference on Computer Vision (ICCV'09).Google ScholarGoogle Scholar
  236. Verbeek, J. and Triggs, B. 2007a. Region classification with markov field aspect models. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR'07).Google ScholarGoogle Scholar
  237. Verbeek, J. and Triggs, B. 2007b. Scene segmentation with conditional random fields learned from partially labeled images. In Proceedings of the Conference on Advances in Neural Information Processing Systems. (NIPS'07).Google ScholarGoogle Scholar
  238. Vijayanarasimhan, S. and Grauman, K. 2011. Efficient region search for object detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR'11). Google ScholarGoogle ScholarDigital LibraryDigital Library
  239. Viola, P. and Jones, M. 2001. Rapid object detection using a boosted cascade of simple features. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR'01).Google ScholarGoogle Scholar
  240. Walk, S., Majer, N., Schindler, K., and Schiele, B. 2010. New features and insights for pedestrian detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR'10).Google ScholarGoogle Scholar
  241. Wang, G., Gallagher, A., Luo, J., and Forsyth, D. 2010a. Seeing people in social context: Recognizing people and social relationships. In Proceedings of the European Conference on Computer Vision (ECCV'10). Google ScholarGoogle ScholarDigital LibraryDigital Library
  242. Wang, G., Hoiem, D., and Forsyth, D. 2009a. Building text features for object image classification. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR'09).Google ScholarGoogle Scholar
  243. Wang, J., Yang, J., Yu, K., Lv, F., Huang, T., and Gong, Y. 2010b. Locality-constrained linear coding for image classification. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR'10).Google ScholarGoogle Scholar
  244. Wang, X. and Grimson, E. 2007. Spatial latent dirichlet allocation. In Proceedings of the Conference on Advances in Neural Information Processing Systems (NIPS'07).Google ScholarGoogle Scholar
  245. Wang, X., Han, T. X., and Yan, S. 2009b. An hog-lbp human detector with partial occlusion handling. In Proceedings of the IEEE International Conference on Computer Vision (ICCV'09).Google ScholarGoogle Scholar
  246. Wang, Y. and Mori, G. 2009. Max-margin hidden conditional random fields for human action recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR'09).Google ScholarGoogle Scholar
  247. Wang, Y. and Mori, G. 2010. Hidden part models for human action recognition: Probabilistic vs. max-margin. IEEE Trans. Pattern Anal. Mach. Intell. 33, 7, 1310--1323. Google ScholarGoogle ScholarDigital LibraryDigital Library
  248. Wang, Z., Hu, Y., and Chia, L.-T. 2010c. Image-to-class distance metric learning for image classification. In Proceedings of the European Conference on Computer Vision (ECCV'10). Google ScholarGoogle ScholarDigital LibraryDigital Library
  249. Watanabe, T., Ito, S., and Yokoi, K. 2009. Co-occurrence histograms of oriented gradients for pedestrian detection. In Proceedings of the Pacific-Rim Symposium on Image and Video Technology (PSIVT'09). Google ScholarGoogle ScholarDigital LibraryDigital Library
  250. Wei, Y. C. and Tao, L.T. 2010. Efficient histogram-based sliding window. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR'10).Google ScholarGoogle Scholar
  251. Winn, J., Criminisi, A., and Minka, T. 2005. Object categorization by learned universal visual dictionary. In Proceedings of the IEEE International Conference on Computer Vision (ICCV'05). Google ScholarGoogle ScholarDigital LibraryDigital Library
  252. Wnuk, K. and Soatto, S. 2008. Filtering internet image search results towards keyword based category recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR'08).Google ScholarGoogle Scholar
  253. Wojek, C. and Schiele, B. 2008. A performance evaluation of single and multi-feature people detection. In Proceedings of the German Association for Pattern Recognition (DAGM'08). Google ScholarGoogle ScholarDigital LibraryDigital Library
  254. Wojek, C., Walk, S., and Schiele, B. 2009. Multi-cue onboard pedestrian detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR'09).Google ScholarGoogle Scholar
  255. Wright, J., Yi, M., Mairal, J., Sapiro, G., Huang, T. S., and Shuicheng, Y. 2010. Sparse representation for computer vision and pattern recognition. Proc. IEEE 98, 1031--1044.Google ScholarGoogle ScholarCross RefCross Ref
  256. Wu, B. and Nevatia, R. 2005. Detection of multiple, partially occluded humans in a single image by bayesian combination of edgelet part detectors. In Proceedings of the IEEE International Conference on Computer Vision (ICCV'05). Google ScholarGoogle ScholarDigital LibraryDigital Library
  257. Wu, B. and Nevatia, R. 2007a. Cluster boosted tree classifier for multi-view, multi-pose object detection. In Proceedings of the IEEE International Conference on Computer Vision (ICCV'07).Google ScholarGoogle Scholar
  258. Wu, B. and Nevatia, R. 2007b. Improving part based object detection by unsupervised, online boosting. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR'07).Google ScholarGoogle Scholar
  259. Wu, B. and Nevatia, R. 2007c. Simultaneous object detection and segmentation by boosting local shape feature based classifier. In Proceedings of the IEEE Conference on Computer Vision and Pattern Reconition (CVPR'07).Google ScholarGoogle Scholar
  260. Wu, Z., Ke, Q. F., Isard, M., and Sun, J. 2009. Bundling features for large scale partial-duplicate web image search. In Proceedings of the IEEE Conferenc on Computer Vision and Pattern Recognition (CVPR'09).Google ScholarGoogle Scholar
  261. Xiang, Y., Zhou, X. D., Liu, Z. T., Chua, T. S., and Ngo, C.-W. 2010. Semantic context modeling with maximal margin conditional random fields for automatic image annotation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR'10).Google ScholarGoogle Scholar
  262. Xu, H., Zhou, X., Wang, M., Xiang, Y., and Shi, B. 2009. Exploring flickr's related tags for semantic annotation of web images. In Proceedings of the ACM International Conference on Image and Video Retrieval (CIVR'09). Google ScholarGoogle ScholarDigital LibraryDigital Library
  263. Xu, Z., Chen, H., Zhu, S.-C., and Luo, J. 2008. A hierarchical compositional model for face representation and sketching. IEEE Trans. Pattern Anal. Mach. Intell. 30, 955--969. Google ScholarGoogle ScholarDigital LibraryDigital Library
  264. Xue, J.-H. 2008. Aspects of generative and discriminative classifiers. Tech. rep., Information and Mathematical Sciences, Department of Statistics, University of Glasgow.Google ScholarGoogle Scholar
  265. Xue, J.-H. and Titterington, D. 2008. Comment on “on discriminative vs. generative classifiers: A comparison of logistic regression and naive bayes”. Neural Process. Lett. 28, 169--187. Google ScholarGoogle ScholarDigital LibraryDigital Library
  266. Xue, J.-H. and Titterington, D. M. 2010. On the generative-discriminative tradeoff approach: Interpretation, asymptotic efficiency and classification performance. Comput. Statist. Data Anal. 54, 438--451. Google ScholarGoogle ScholarDigital LibraryDigital Library
  267. Yan, P. K., Khan, S. M., and Shah, M. 2007. 3D model based object class detection in an arbitrary view. In Proceedings of the IEEE International Conference on Computer Vision (ICCV'07).Google ScholarGoogle Scholar
  268. Yang, B., Mei, T., Sun, L.-F., Yang, S.-Q., and Hua, X.-S. 2008a. Free-shaped video collage. In Proceedings of the 14th International Conference on Advances in Multimedia Modeling. Google ScholarGoogle ScholarDigital LibraryDigital Library
  269. Yang, J. C., Yu, K., Gong, Y. H., and Huang, T. 2009. Linear spatial pyramid matching using sparse coding for image classification. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR'09).Google ScholarGoogle Scholar
  270. Yang, L., Jin, R., Sukthankar, R., and Jurie, F. 2008b. Unifying discriminative visual codebook generation with classifier training for object category recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR'08).Google ScholarGoogle Scholar
  271. Yang, Y. and Ramanan, D. 2011. Articulated pose estimation with flexible mixtures-of-parts. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR'11). Google ScholarGoogle ScholarDigital LibraryDigital Library
  272. Yao, B. Z., Yang, X., Lin, L., Lee, M. W., and Zhu, S. C. 2010. I2T: Image parsing to text description. Proc. IEEE. 98, 1485--1508.Google ScholarGoogle ScholarCross RefCross Ref
  273. Yeh, T., Lee, J. J., and Darrell, T. 2009. Fast concurrent object localization and recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR'09).Google ScholarGoogle Scholar
  274. Yu, C. N. J. and Joachims, T. 2009. Learning structural svms with latent variables. In Proceedings of the International Conference on Machine Learning (ICML'09). Google ScholarGoogle ScholarDigital LibraryDigital Library
  275. Zhang, C., Liu, J., Tian, Q., Xu, C., Lu, H., and Ma, S. 2011a. Image classification by non-negative sparse coding, low-rank and sparse decomposition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR'11). Google ScholarGoogle ScholarDigital LibraryDigital Library
  276. Zhang, D. Q. and Chang, S. F. 2006. A generative-discriminative hybrid method for multi-view object detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR'06). Google ScholarGoogle ScholarDigital LibraryDigital Library
  277. Zhang, J., Marszałek, M., Lazebnik, S., and Schmid, C. 2007. Local features and kernels for classification of texture and object categories: A comprehensive study. Int. J. Comput. Vis. 73, 213--238. Google ScholarGoogle ScholarDigital LibraryDigital Library
  278. Zhang, J. G., Huang, K. Q., Yu, Y. N., and Tan, T. N. 2011b. Boosted local structured hog-lbp for object localization. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR'11). Google ScholarGoogle ScholarDigital LibraryDigital Library
  279. Zhang, Z. Q., Cao, Y., Salvi, D., Oliver, K., Waggoner, J., and Wang, S. 2010. Free-shape subwindow search for object localization. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR'10).Google ScholarGoogle Scholar
  280. Zheng, W. S., Gong, S. G., and Xiang, T. 2009. Quantifying contextual information for object detection. In Proceedings of the IEEE International Conference on Computer Vision (ICCV'09).Google ScholarGoogle Scholar
  281. Zhu, L., Chen, Y., Lin, C., and Yuille, A. 2011. Max margin learning of hierarchical configural deformable templates (hcdts) for efficient object parsing and pose estimation. Int. J. Comput. Vis. 93, 1--21. Google ScholarGoogle ScholarDigital LibraryDigital Library
  282. Zhu, L., Chen, Y. H., Yuille, A., and Freeman, W. 2010. Latent hierarchical structural learning for object detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR'10).Google ScholarGoogle Scholar
  283. Zhu, M. 2004. Recall, precision and average precision. Working paper, University of Waterloo.Google ScholarGoogle Scholar
  284. Zhu, Q., Yeh, M. C., Cheng, K. T., and Avidan, S. 2006. Fast human detection using a cascade of histograms of oriented gradients. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR'06). Google ScholarGoogle ScholarDigital LibraryDigital Library
  285. Zhu, S.-C. and Mumford, D. 2006. A stochastic grammar of images. Foundations Trends Comput. Graph. Vis. 2, 259--362. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Object class detection: A survey

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in

      Full Access

      • Published in

        cover image ACM Computing Surveys
        ACM Computing Surveys  Volume 46, Issue 1
        October 2013
        551 pages
        ISSN:0360-0300
        EISSN:1557-7341
        DOI:10.1145/2522968
        Issue’s Table of Contents

        Copyright © 2013 ACM

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 11 July 2013
        • Accepted: 1 January 2013
        • Revised: 1 September 2012
        • Received: 1 May 2011
        Published in csur Volume 46, Issue 1

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • research-article
        • Research
        • Refereed

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader