Skip to main content

Where Next in Object Recognition and how much Supervision Do We Need?

  • Chapter
Advanced Topics in Computer Vision

Part of the book series: Advances in Computer Vision and Pattern Recognition ((ACVPR))

Abstract

Object class recognition is an active topic in computer vision still presenting many challenges. In most approaches, this task is addressed by supervised learning algorithms that need a large quantity of labels to perform well. This leads either to small datasets (<10,000 images) that capture only a subset of the real-world class distribution (but with a controlled and verified labeling procedure), or to large datasets that are more representative but also add more label noise. Therefore, semi-supervised learning has been established as a promising direction to address object recognition. It requires only few labels while simultaneously making use of the vast amount of images available today. In this chapter, we outline the main challenges of semi-supervised object recognition, we review existing approaches, and we emphasize open issues that should be addressed next to advance this research topic.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 109.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Angluin D, Laird P (1988) Learning from noisy examples. Mach Learn 2:343–370

    Google Scholar 

  2. Argyriou A, Herbster M, Pontil M (2005) Combining graph Laplacians for semi-supervised learning. In: NIPS

    Google Scholar 

  3. Ashby FG (1992) Multidimensional models of categorization. In: Multidimensional models of perception and cognition. Erlbaum, Hillsdale, pp 449–483

    Google Scholar 

  4. Ashby FG, Todd WT (2011) Human category learning 2.0. Ann NY Acad Sci 1224:147–161

    Article  Google Scholar 

  5. Balcan M-F, Blum A (2005) A PAC-style model for learning from labeled and unlabeled data. In: COLT

    Google Scholar 

  6. Balcan M-f, Blum A, Pakyan Choi P, Lafferty J, Pantano B, Rwebangira MR, Zhu X (2005) Person identification in webcam images: an application of semi-supervised learning. In: ICML WS

    Google Scholar 

  7. Baram Y, El-yaniv R, Luz K (2004) Online choice of active learning algorithms. J Mach Learn Res 5:255–291

    MathSciNet  Google Scholar 

  8. Bauckhage C, Thurau C (2009) Making archetypal analysis practical. In: DAGM

    Google Scholar 

  9. Berg TL, Forsyth DA (2006) Animals on the web. In: CVPR

    Google Scholar 

  10. Biederman I (1987) Recognition-by-components: a theory of human image understanding. Psychol Rev 94(2):115–147

    Article  Google Scholar 

  11. Bischof H, Pinz A, Kropatsch WG (1992) Visualization methods for neural networks. In: IAPR

    Google Scholar 

  12. Blum A, Chawla S (2001) Learning from labeled and unlabeled data using graph mincuts. In: ICML

    Google Scholar 

  13. Buhmann JM, Zöller T (2000) Active learning for hierarchical pairwise data clustering. In: ICPR

    Google Scholar 

  14. Burl MC, Perona P (1996) Recognition of planar object classes. In: CVPR

    Google Scholar 

  15. Cebron N, Berthold MR (2009) Active learning for object classification: from exploration to exploitation. Data Min Knowl Discov 18(2):283–299

    Article  MathSciNet  Google Scholar 

  16. Cebron N, Richter F, Lienhart R (2012) “I can tell you what it’s not”: active learning from counterexamples. In: Progress in artificial intelligence

    Google Scholar 

  17. Chabris C, Simons D (2010) The invisible gorilla: how our intuitions deceive us. Crown Publishing Group

    Google Scholar 

  18. Chawla NV, Bowyer KW, Hall LO, Kegelmeyer WP (2002) SMOTE: synthetic minority over-sampling technique. J Artif Intell Res 16:341–378

    Google Scholar 

  19. Cohen B, Murphy GL (1984) Models of concepts. Cogn Sci 8(1):27–58

    Article  Google Scholar 

  20. Cootes TF, Edwards GJ, Taylor CJ (1998) Active appearance models. In: ECCV

    Google Scholar 

  21. Cutler A, Breiman L (1994) Archetypal analysis. Technometrics 36(4):338–347

    Article  MathSciNet  MATH  Google Scholar 

  22. Daitch SI, Kelner JA, Spielman DA, Haven N (2009) Fitting a graph to vector data. In: ICML

    Google Scholar 

  23. Damasio A (1994) Descartes’ error: emotion, reason, and the human brain. Penguin Group

    Google Scholar 

  24. Davis J, Kulis B, Jain P, Sra S, Dhillon I (2007) Information-theoretic metric learning. In: ICML

    Google Scholar 

  25. Delaitre V, Fouhey DF, Laptev I, Sivic J, Gupta A, Efros AA (2012) Scene semantics from long-term observation of people. In: ECCV

    Google Scholar 

  26. Delalleau O, Bengio Y, Le Roux N (2005) Efficient non-parametric function induction in semi-supervised learning. In: AISTATS

    Google Scholar 

  27. Deng J, Dong W, Socher R, Li L-J, Li K, Fei-Fei L (2009) ImageNet: a large-scale hierarchical image database. In: CVPR, June 2009. IEEE

    Google Scholar 

  28. Dubout C, Fleuret F (2011) Tasting families of features for image classification. In: ICCV

    Google Scholar 

  29. Ebert S (2012) Semi-supervised learning for image classification. PhD thesis, Saarland University

    Google Scholar 

  30. Ebert S, Larlus D, Schiele B (2010) Extracting structures in image collections for object recognition. In: ECCV

    Google Scholar 

  31. Ebert S, Fritz M, Schiele B (2011) Pick your neighborhood—improving labels and neighborhood structure for label propagation. In: DAGM

    Google Scholar 

  32. Ebert S, Fritz M, Schiele B (2012) Active metric learning for object recognition. In: DAGM

    Google Scholar 

  33. Ebert S, Fritz M, Schiele B (2012) Semi-supervised learning on a budget: scaling up to large datasets. In: ACCV

    Google Scholar 

  34. Elhamifar E, Sapiro G, Vidal R (2012) See all by looking at a few: sparse modeling for finding representative objects. In: CVPR

    Google Scholar 

  35. Erickson MA, Kruschke JK (1998) Rules and exemplars in category learning. J Exp Psychol Gen 127(2):107–140

    Article  Google Scholar 

  36. Everingham M, Van Gool L, Williams CK (2008) The PASCAL VOC

    Google Scholar 

  37. Farajtabar M, Shaban A, Reza Rabiee H, Rohban MH (2011) Manifold coarse graining for online semi-supervised learning. In: ECML

    Google Scholar 

  38. Fei-Fei L, Fergus R, Perona P (2006) One-shot learning of object categories. IEEE Trans Pattern Anal Mach Intell 28(4):594–611

    Article  Google Scholar 

  39. Felzenszwalb PF, Girshick RB, McAllester D, Ramanan D (2010) Object detection with discriminatively trained part-based models. IEEE Trans Pattern Anal Mach Intell 32(9):1627–1645

    Article  Google Scholar 

  40. Fergus R, Weiss Y, Torralba A (2009) Semi-supervised learning in gigantic image collections. In: NIPS

    Google Scholar 

  41. Fisher RA (1936) The use of multiple measurements in taxonomic problems. Ann Eugen 7:179–188

    Article  Google Scholar 

  42. Fowlkes C, Belongie S, Chung F, Malik J (2004) Spectral grouping using the Nystrom method. IEEE Trans Pattern Anal Mach Intell 26(2):214–225

    Article  Google Scholar 

  43. Freeman WT (2011) Where computer vision needs help from computer science. In: ACM-SIAM symposium on discrete algorithms

    Google Scholar 

  44. Fritz M, Black M, Bradski G, Darrell T (2009) An additive latent feature model for transparent object recognition. In: NIPS

    Google Scholar 

  45. Fussenegger M, Roth PM, Bischof H, Pinz A (2006) On-line, incremental learning of a robust active shape model. Pattern Recognit 4174:122–131

    Google Scholar 

  46. Gehler P, Nowozin S (2009) On feature combination for multiclass object classification. In: ICCV

    Google Scholar 

  47. Goldberg AB, Zhu X, Wright S (2007) Dissimilarity in graph-based semi-supervised classification. In: AISTATS

    Google Scholar 

  48. Grabner H, Leistner C, Bischof H (2008) Semi-supervised on-line boosting for robust tracking. In: ECCV

    Google Scholar 

  49. Hayward WG (2003) After the viewpoint debate: where next in object recognition? Trends Cogn Sci 7(10):425–427

    Article  MathSciNet  Google Scholar 

  50. Joachims T (1999) Transductive inference for text classification using support vector machines. In: ICML

    Google Scholar 

  51. Kahneman D, Tversky A (1979) Prospect theory: an analysis of decision under risk. Econometrica 47(2):263–291

    Article  MATH  Google Scholar 

  52. Kant I (1781) Kritik der reinen Vernunft. Johann Friedrich Hartknoch Verlag. English edition: Kant I (1838) Critique of pure reason (trans: Haywood F)

    Google Scholar 

  53. Kaplan AS, Murphy GL (2000) Category learning with minimal prior knowledge. J Exp Psychol 26(4):829–846

    Google Scholar 

  54. Karlen M, Weston J, Erkan A, Collobert R (2008) Large scale manifold transduction. In: ICML. ACM Press, New York

    Google Scholar 

  55. Kato T, Kashima H, Sugiyama M (2009) Robust label propagation on multiple networks. IEEE Trans Neural Netw 20(1):35–44

    Article  Google Scholar 

  56. Khosla A, Zhou T, Malisiewicz T, Efros AA, Torralba A (2012) Undoing the damage of dataset bias. In: ECCV

    Google Scholar 

  57. Kruschke JK (1992) ALCOVE: an exemplar-based connectionist model of category learning. Psychol Rev 99(1):22–44

    Article  Google Scholar 

  58. Kulis B, Jain P, Grauman K (2009) Fast similarity search for learned metrics. IEEE Trans Pattern Anal Mach Intell 31(12):2143–2157

    Article  Google Scholar 

  59. Lampert CH, Nickisch H, Harmeling S (2009) Learning to detect unseen object classes by between-class attribute transfer. In: CVPR

    Google Scholar 

  60. Lee YJ, Grauman K (2009) Foreground focus: unsupervised learning from partially matching images. Int J Comput Vis 85:143–166

    Article  Google Scholar 

  61. Leibe B, Seemann E, Schiele B (2005) Pedestrian detection in crowded scenes. In: CVPR. IEEE

    Google Scholar 

  62. Levin DT, Simons DJ (1997) Failure to detect changes to attended objects in motion pictures. Psychon Bull Rev 4(4):501–506

    Article  Google Scholar 

  63. Li W, Fritz M (2012) Recognizing materials from virtual examples. In: ECCV

    Google Scholar 

  64. Li Y-F, Zhou Z-H (2011) Improving semi-supervised support vector machines through unlabeled instances selection. In: AAAI

    Google Scholar 

  65. Liu W, He J, Chang SF (2010) Large graph construction for scalable semi-supervised learning. In: ICML, pp 1–8

    Google Scholar 

  66. Lu Z, Jain P, Dhillon IS (2009) Geometry-aware metric learning. In: ICML

    Google Scholar 

  67. Medin DL, Schaffer MM (1978) Context theory of classification learning. Psychol Rev 85(3):207–238

    Article  Google Scholar 

  68. Minda JP, Smith JD (2001) Prototypes in category learning: the effects of category size, category structure, and stimulus complexity. J Exp Psychol Learn Mem Cogn 27(3):775–799

    Article  Google Scholar 

  69. Murphy GL (2002) The big book of concepts

    Google Scholar 

  70. Murphy GL, Allopenna PD (1994) The locus of knowledge effects in concept learning. J Exp Psychol Learn Mem Cogn 20(4):904–919

    Article  Google Scholar 

  71. Murphy GL, Medin DL (1985) The role of theories in conceptual coherence. Psychol Rev 92(3):289–316

    Article  Google Scholar 

  72. Nguyen HT, Smeulders A (2004) Active learning using pre-clustering. In: ICML

    Google Scholar 

  73. Nigam K, McCallum AK, Thrun S, Mitchell T (2000) Text classification from labeled and unlabeled documents using EM. Mach Learn 39:103–134

    Article  MATH  Google Scholar 

  74. Nosofsky RM (1984) Choice, similarity, and the context theory of classification. J Exp Psychol 10(1):104–114

    Google Scholar 

  75. Osherson DN, Smith EE (1981) On the adequacy of prototype theory as a theory of concepts. Cognition 9(1):35–58

    Article  Google Scholar 

  76. Osugi T, Kun D, Scott S (2005) Balancing exploration and exploitation: a new algorithm for active machine learning. In: ICDM

    Google Scholar 

  77. Parikh D, Grauman K (2011) Relative attributes. In: ICCV, November 2011. IEEE

    Google Scholar 

  78. Pazzani MJ (1991) Influence of prior knowledge on concept acquisition: experimental and computational results. J Exp Psychol 17(3):416–432

    Google Scholar 

  79. Pearson K (1901) On lines and planes of closest fit to systems of points in space. Philos Mag 2(6):559–572

    Article  Google Scholar 

  80. Pepik B, Stark M, Gehler P, Schiele B (2012) Teaching 3D geometry to deformable part models. In: CVPR

    Google Scholar 

  81. Pishchulin L, Jain A, Andriluka M, Thormälen T, Schiele B (2012) Articulated people detection and pose estimation: reshaping the future. In: CVPR

    Google Scholar 

  82. Ponce J, Berg TL, Everingham M, Forsyth DA, Hebert M, Lazebnik S, Marszalek M, Schmid C, Russell BC, Torralba A, Williams CKI, Zhang J, Zisserman A (2006) Dataset issues in object recognition. In: Ponce J, Hebert M, Schmid C, Zisserman A (eds) Towards category-level object recognition. LNCS. Springer, Berlin, pp 29–48

    Chapter  Google Scholar 

  83. Pope A, Lowe DG (1996) Learning appearance models for object recognition. In: Object representation in computer vision II

    Google Scholar 

  84. Posner MI, Goldsmith R, Welton KE (1967) Perceived distance and the classification of distorted patterns. J Exp Psychol 73(1):28–38

    Article  Google Scholar 

  85. Prabhakaran S, Raman S, Vogt JE, Roth V (2012) Automatic model selection in archetype analysis. In: DAGM

    Google Scholar 

  86. Rohban MH, Rabiee HR (2012) Supervised neighborhood graph construction for semi-supervised classification. Pattern Recognit 45(4):1363–1372

    Article  MATH  Google Scholar 

  87. Rohrbach M, Stark M, Schiele B (2011) Evaluating knowledge transfer and zero-shot learning in a large-scale setting. In: CVPR

    Google Scholar 

  88. Rosch E, Mervis CB, Gray WD, Johnson DM, Boyes-Braem P (1976) Basic objects in natural categories. Cogn Psychol 8:382–439

    Article  Google Scholar 

  89. Saffari A, Godec M, Pock T, Leistner C, Bischof H (2010) Online multi-class LPBoost. In: CVPR, June 2010. IEEE

    Google Scholar 

  90. Schiele B (2000) Towards automatic extraction and modeling of objects from image sequences. In: Int sym on intelligent robotic systems

    Google Scholar 

  91. Schiele B, Crowley JL (1996) Where to look next and what to look for. In: IROS

    Google Scholar 

  92. Schiele B, Crowley JL (1997) The concept of visual classes for object classification. In: Scand conf image analysis

    Google Scholar 

  93. Schiele B, Crowley JL (1998) Transinformation for active object recognition. In: ICCV

    Google Scholar 

  94. Schiele B, Crowley JL (2000) Recognition without correspondence using multidimensional receptive field histograms. Int J Comput Vis 36(1):31–52

    Article  Google Scholar 

  95. Schiele B, Pentland A (1999) Probabilistic object recognition and localization. In: ICCV

    Google Scholar 

  96. Schnitzspan P, Fritz M, Roth S, Schiele B, Berkeley Eecs UC (2009) Discriminative structure learning of hierarchical representations for object detection. In: CVPR

    Google Scholar 

  97. Schohn G, Cohn D (2000) Less is more: active learning with support vector machines. In: ICML

    Google Scholar 

  98. Seeger M (2001) Learning with labeled and unlabeled data. Technical report, University of Edinburgh

    Google Scholar 

  99. Settles B (2009) Active Learning Literature Survey. Technical report, University of Wisconsin–Madison

    Google Scholar 

  100. Simon I, Snavely N, Seitz SM (2007) Scene summarization for online image collections. In: ICCV. IEEE

    Google Scholar 

  101. Simons DJ, Chabris CF (1999) Gorillas in our midst: sustained inattentional blindness for dynamic events. Perception 28(9):1059–1074

    Article  Google Scholar 

  102. Simons DJ, Levin DT (1998) Failure to detect changes to people during a real-world interaction. Psychon Bull Rev 5(4):644–649

    Article  Google Scholar 

  103. Sivic J, Russell BC, Efros AA, Zisserman A, Freeman WT (2005) Discovering object categories in image collections. In: ICCV

    Google Scholar 

  104. Smith E, Medin DL (1981) Categories and concepts. Harvard University Press, Cambridge

    Google Scholar 

  105. Sonnenburg S, Rätsch G, Schäfer C, Schölkopf B (2006) Large scale multiple kernel learning. J Mach Learn Res 7:1531–1565

    MathSciNet  MATH  Google Scholar 

  106. Stark M, Goesele M, Schiele B (2010) Back to the future: learning shape models from 3D CAD data. In: BMVC

    Google Scholar 

  107. Sternig S, Roth PM, Bischof H (2012) On-line inverse multiple instance boosting for classifier grids. Pattern Recognit Lett, 33(7):890–897

    Article  Google Scholar 

  108. Sugiyama M, Rubens N (2008) Active learning with model selection in linear regression. In: DMKD

    Google Scholar 

  109. Talwalkar A, Kumar S, Rowley H (2008) Large-scale manifold learning. In: CVPR, June 2008, pp 1–8

    Google Scholar 

  110. Tong W, Jin R (2007) Semi-supervised learning by mixed label propagation. In: AAAI, vol 22

    Google Scholar 

  111. Tong S, Koller D (2001) Support vector machine active learning with applications to text classification. J Mach Learn Res 2:45–66

    Google Scholar 

  112. Tong H, He J, Li M, Zhang C, Ma WY (2005) Graph based multi-modality learning. In: ACM multimedia

    Google Scholar 

  113. Torralba A (2011) Unbiased look at dataset bias. In: CVPR

    Google Scholar 

  114. Torralba BA, Russell BC, Yuen J (2010) LabelMe: online image annotation and applications. In: Proc IEEE

    Google Scholar 

  115. Tsang IW, Kwok JT (2006) Large-scale sparsified manifold regularization. In: NIPS

    Google Scholar 

  116. Tsuda K, Shin H, Schoelkopf B (2005) Fast protein classification with multiple networks. Bioinformatics 21:59–65

    Article  Google Scholar 

  117. Vedaldi A, Gulshan V, Varma M, Zisserman A (2009) Multiple kernels for object detection. In: ICCV, pp 606–613

    Google Scholar 

  118. Vernon D (2005) A research roadmap of cognitive vision. Technical report, ECVision: the European research network for cognitive computer vision systems

    Google Scholar 

  119. Von Luxburg U, Radl A, Hein M (2010) Getting lost in space: large sample analysis of the commute distance. In: NIPS

    Google Scholar 

  120. Wang L, Chan KL, Zhang Z (2003) Bootstrapping SVM active learning by incorporating unlabelled images for image retrieval. In: CVPR. IEEE Comput. Soc., Los Alamitos

    Google Scholar 

  121. Wang X, Han TX, Yan S (2009) An HOG-LBP human detector with partial occlusion handling. In: ICCV, September 2009. IEEE

    Google Scholar 

  122. Wang G, Wang B, Yang X, Yu G (2012) Efficiently indexing large sparse graphs for similarity search. IEEE Trans Knowl Data Eng 24(3):440–451

    Article  MATH  Google Scholar 

  123. Weber M, Welling M, Perona P (2000) Unsupervised learning of models for recognition. In: ECCV

    Google Scholar 

  124. Welinder P, Branson S, Belongie S, Perona P (2010) The multidimensional wisdom of crowds. In: NIPS, pp 1–9

    Google Scholar 

  125. Wiskott L, von der Malsburg C (1993) A neural system for the recognition of partially occluded objects in cluttered scenes. Int J Pattern Recognit Artif Intell 7(4):935–948

    Article  Google Scholar 

  126. Yang L (2006) Distance metric learning: a comprehensive survey. Technical report, Michigan State University

    Google Scholar 

  127. Yang X, Bai X, Köknar-Tezel S, Latecki LJ (2013) Densifying distance spaces for shape and image retrieval. J Math Imaging Vis 46:12–28

    Article  Google Scholar 

  128. Zaki SR, Nosofsky RM (2007) A high-distortion enhancement effect in the prototype-learning paradigm: dramatic effects of category learning during test. Mem Cogn 35(8):2088–2096

    Article  Google Scholar 

  129. Zaki SR, Nosofsky RM, Stanton RD, Cohen AL (2003) Prototype and exemplar accounts of category learning and attentional allocation: a reassessment. J Exp Psychol Learn Mem Cogn 29(6):1160–1173

    Article  Google Scholar 

  130. Zhang Z, Zha H, Zhang M (2008) Spectral methods for semi-supervised manifold learning. In: CVPR

    Google Scholar 

  131. Zhang K, Kwok JT, Parvin B (2009) Prototype vector machine for large scale semi-supervised learning. In: ICML. ACM Press, New York

    Google Scholar 

  132. Zhou D, Bousquet O, Navin Lal T, Weston J, Schölkopf B (2004) Learning with local and global consistency. In: NIPS

    Google Scholar 

  133. Zhu X, Goldberg AB, Khot T (2009) Some new directions in graph-based semi-supervised learning. In: ICME

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Sandra Ebert .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2013 Springer-Verlag London

About this chapter

Cite this chapter

Ebert, S., Schiele, B. (2013). Where Next in Object Recognition and how much Supervision Do We Need?. In: Farinella, G., Battiato, S., Cipolla, R. (eds) Advanced Topics in Computer Vision. Advances in Computer Vision and Pattern Recognition. Springer, London. https://doi.org/10.1007/978-1-4471-5520-1_2

Download citation

  • DOI: https://doi.org/10.1007/978-1-4471-5520-1_2

  • Publisher Name: Springer, London

  • Print ISBN: 978-1-4471-5519-5

  • Online ISBN: 978-1-4471-5520-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics