Learning to Recognize Objects with Little Supervision

Carbonetto, Peter; Dorkó, Gyuri; Schmid, Cordelia; Kück, Hendrik; de Freitas, Nando

doi:10.1007/s11263-007-0067-7

Learning to Recognize Objects with Little Supervision

Published: 21 July 2007

Volume 77, pages 219–237, (2008)
Cite this article

International Journal of Computer Vision Aims and scope Submit manuscript

Peter Carbonetto¹,
Gyuri Dorkó²,
Cordelia Schmid²,
Hendrik Kück¹ &
…
Nando de Freitas¹

358 Accesses
Explore all metrics

Abstract

This paper shows (i) improvements over state-of-the-art local feature recognition systems, (ii) how to formulate principled models for automatic local feature selection in object class recognition when there is little supervised data, and (iii) how to formulate sensible spatial image context models using a conditional random field for integrating local features and segmentation cues (superpixels). By adopting sparse kernel methods, Bayesian learning techniques and data association with constraints, the proposed model identifies the most relevant sets of local features for recognizing object classes, achieves performance comparable to the fully supervised setting, and obtains excellent results for image classification.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Artificial Intelligence

References

Agarwal, S., Awan, A., & Roth, D. (2004). Learning to detect objects in images via a sparse, part-based representation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 26, 1475–1490.
Article Google Scholar
Andrews, S., Tsochantaridis, I., & Hofmann, T. (2002). Multiple instance learning with generalized support vector machines. In Proceedings of the 18th national conference on artificial intelligence (pp. 943–944).
Andrieu, C., de Freitas, N., Doucet, A., & Jordan, M. I. (2003). An introduction to MCMC for machine learning. Machine Learning, 50(1–2), 5–43.
Article MATH Google Scholar
Bernardo, J. M., & Smith, A. F. M. (2000). Bayesian theory. New York: Wiley.
MATH Google Scholar
Besag, J. (1974). Spatial interaction and the statistical analysis of lattice systems. Journal of the Royal Statistical Society, Series B, 36, 192–236.
MATH MathSciNet Google Scholar
Boyd, S., & Vandenberghe, L. (2004). Convex optimization. Cambridge: Cambridge University Press.
MATH Google Scholar
Carbonetto, P., de Freitas, N., Gustafson, P., & Thompson, N. (2003). Bayesian feature weighting for unsupervised learning, with application to object recognition. In Proceedings of the workshop on artificial intelligence and statistics.
Carbonetto, P., de Freitas, N., & Barnard, K. (2004a). A statistical model for general contextual object recognition. In Proceedings of the 8th European conference on computer vision (Vol. I, pp. 350–362).
Carbonetto, P., Dorko, G., Schmid, C., & de Freitas, N. (2004b). Bayesian learning for weakly supervised object classification. Technical report, INRIA Rhône-Alpes.
Celeux, G., Hurn, M., & Robert, C. P. (2000). Computational and inferential difficulties with mixture posterior distributions. Journal of the American Statistical Association, 95, 957–970.
Article MATH MathSciNet Google Scholar
Chib, S., & Greenberg, E. (1995). Understanding the Metropolis–Hastings algorithm. The American Statistician, 49(4), 327–335.
Article Google Scholar
Csurka, G., Dance, C. R., Fan, L., Willamowski, J., & Bray, C. (2004). Visual categorization with bags of keypoints. In Proceedings of the ECCV international workshop on statistical learning in computer vision.
Deselaers, T., Keysers, D., & Ney, H. (2005). Discriminative training for object recognition using images patches. In Proceedings of the IEEE conference on computer vision and pattern recognition (Vol. 2, pp. 157–162).
Dietterich, T. G., Lathrop, R. H., & Lozano-Perez, T. (1997). Solving the multiple instance learning with axis-parallel rectangles. Artificial Intelligence, 89(1), 31–71.
Article MATH Google Scholar
Dorkó, G., & Schmid, C. (2003). Selection of scale invariant neighborhoods for object class recognition. In Proceedings of the 9th IEEE international conference on computer vision (Vol. I, pp. 634–640).
Duygulu, P., Barnard, K., de Freitas, N., & Forsyth, D. A. (2002). Object recognition as machine translation: learning a lexicon for a fixed image vocabulary. In Proceedings of the 7th European conference on computer vision (Vol. IV, pp. 97–112).
Everingham, M., Zisserman, A., Williams, C., & Gool, L. V. (2006). The PASCAL visual object classes challenge 2006 (VOC2006) results. Technical report.
Fergus, R., Perona, P., & Zisserman, A. (2003). Object class recognition by unsupervised scale-invariant learning. In Proceedings of the IEEE conference on computer vision and pattern recognition (Vol. II, pp. 264–271).
Hamze, F., & de Freitas, N. (2004). From fields to trees. In Proceedings of the 20th conference on uncertainty in artificial intelligence (pp. 243–250).
Kadir, T., & Brady, M. (2001). Scale, saliency and image description. International Journal of Computer Vision, 45(2), 83–105.
Article MATH Google Scholar
Kohn, R., Smith, M., & Chan, D. (2001). Nonparametric regression using linear combinations of basis functions. Statistics and Computing, 11, 313–322.
Article MathSciNet Google Scholar
Kück, H., & de Freitas, N. (2005). Learning about individuals from group statistics. In Proceedings of the 21st conference on uncertainty in artificial intelligence (pp. 332–339).
Kück, H., Carbonetto, P., & de Freitas, N. (2004). A constrained semi-supervised learning approach to data association. In Proceedings of the 8th European conference on computer vision (Vol. III, pp. 1–12).
Kumar, S., & Hebert, M. (2006). Discriminative random fields. International Journal of Computer Vision, 26, 179–201.
Article Google Scholar
Lafferty, J., McCallum, A., & Pereira, F. (2001). Conditional random fields: probabilistic models for segmenting and labeling sequence data. In Proceedings of the 18th international conference on machine learning.
Lazebnik, S., Schmid, C., & Ponce, J. (2005). A sparse texture representation using local affine regions. IEEE Transactions on Pattern Analysis and Machine Intelligence.
Leibe, B., Seemann, E., & Schiele, B. (2005). Pedestrian detection in crowded scenes. In Proceedings of the IEEE conference on computer vision and pattern recognition (Vol. I, pp. 878–885).
Lindeberg, T. (1998). Feature detection with automatic scale selection. International Journal of Computer Vision, 30(2), 79–116.
Article Google Scholar
Liu, J. S., & Wu, Y. N. (1999). Parameter expansion for data augmentation. Journal of the American Statistical Association, 94(448), 1264–1274.
Article MATH MathSciNet Google Scholar
Liu, J. S., Wong, W. H., & Kong, A. (1994). Covariance structure of the Gibbs sampler with applications to the comparisons of estimators and augmentation schemes. Biometrika, 81(1), 27–40.
Article MATH MathSciNet Google Scholar
Lowe, D. G. (2004). Distinctive image features from scale-invariant keypoints. International Journal of Computer Vision, 60(2), 91–110.
Article Google Scholar
Marsden, J. E., & Tromba, A. J. (1999). Vector calculus (4th ed.). New York: Freeman.
Google Scholar
McFadden, D. (1989). A method of simulated moments for estimation of discrete response models without numerical integration. Econometrica, 57, 995–1026.
Article MATH MathSciNet Google Scholar
Mikolajczyk, K., & Schmid, C. (2001). Indexing based on scale invariant interest points. In Proceedings of the 8th international conference on computer vision (Vol. I, pp. 525–531).
Mikolajczyk, K., & Schmid, C. (2003). A Performance evaluation of local descriptors. In Proceedings of the IEEE conference on computer vision and pattern recognition (Vol. II, pp. 257–263).
Mikolajczyk, K., Schmid, C., & Zisserman, A. (2004). Human detection based on a probabilistic assembly of robust part detectors. In Proceedings of the 8th European conference on computer vision (Vol. I, pp. 69–82).
Miller, T., Berg, A. C., Edwards, J., Maire, M., White, R., Teh, Y. W., Learned-Miller, E., & Forsyth, D. A. (2004). Names and faces in the news. In Proceedings of the IEEE conference on computer vision and pattern recognition (Vol. II, pp. 848–854).
Opelt, A., Fussenegger, M., Pinz, A., & Auer, P. (2004). Weak hypotheses and boosting for generic object detection and recognition. In Proceedings of the 8th European conference on computer vision (Vol. II, pp. 71–84).
Pasula, H., Marthi, B., Milch, B., Russell, S., & Shpitser, I. (2003). Identity uncertainty and citation matching. In Advances in neural information processing systems 15.
Quattoni, A., Collins, M., & Darrell, T. (2005). Conditional random fields for object recognition. In Advances in neural information processing systems 17 (pp. 1097–1104)
Ren, X., & Malik, J. (2003). Learning a classification model for segmentation. In Proceedings of the 9th IEEE international conference on computer vision (Vol. I, pp. 10–17).
Robert, C. P. (1994). The Bayesian choice. Berlin: Springer.
MATH Google Scholar
Robert, C. P. (1995). Simulation of truncated normal variables. Statistics and Computing, 5, 121–125.
Article MATH Google Scholar
Robert, C. P., & Casella, G. (2004). Monte Carlo statistical methods (2nd ed.). Berlin: Springer.
MATH Google Scholar
Serre, T., Wolf, L., & Poggio, T. (2005). Object recognition with features inspired by visual cortex. In Proceedings of the IEEE conference on computer vision and pattern recognition (Vol. II, pp. 994–1000).
Shi, J., & Malik, J. (1997). Normalized cuts and image segmentation. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 731–737).
Sivic, J., Russell, B. C., Efros, A. A., Zisserman, A., & Freeman, W. T. (2005). Discovering objects and their locations in images. In Proceedings of the IEEE conference on computer vision and pattern recognition (Vol. I, pp. 370–377).
Tham, S. (2002). Markov chain Monte Carlo for sparse Bayesian regression and classification. PhD thesis, University of Melbourne.
Tham, S. S., Doucet, A., & Kotagiri, R. (2002). Sparse Bayesian learning for regression and classification using Markov Chain Monte Carlo. In Proceedings of the 19th international conference on machine learning.
Tipping, M. E. (2001). Sparse Bayesian learning and the relevance vector machine. Journal of Machine Learning Research, 1, 211–244.
Article MATH MathSciNet Google Scholar
Torralba, A., Murphy, K. P., Freeman, W. T., & Rubin, M. A. (2003). Context-based vision system for place and object recognition. In Proceedings of the 9th IEEE international conference on computer vision (Vol. I, pp. 273–280).
Viola, P., & Jones, M. J. (2004). Robust real-time face detection. International Journal of Computer Vision, 57(2), 137–154.
Article Google Scholar
Willamowski, J., Arregui, D., Csurka, G., Dance, C. R., & Fan, L. (2004). Categorizing nine visual classes using local appearance descriptors. In Proceedings of the CVPR workshop on learning for adaptable visual systems.
Winn, J., & Shotton, J. (2006). The layout consistent random field for recognizing and segmenting partially occluded objects. In Proceedings of the IEEE conference on computer vision and pattern recognition (Vol. 1, pp. 37–44).
Zellner, A. (1971). An introduction to Bayesian inference in econometrics. New York: Wiley.
MATH Google Scholar
Zhang, J., Marsałek, M., Lazebnik, S., & Schmid, C. (2006). Local features and kernels for classification of texture and object categories: a comprehensive study. International Journal of Computer Vision, 73(2), 213–238.
Article Google Scholar
Zhu, X., Ghahramani, Z., & Lafferty, J. (2003). Semi-supervised learning using Gaussian fields and harmonic functions. In Proceedings of the 20th international conference on machine learning (pp. 912–919).

Download references

Author information

Authors and Affiliations

University of British Columbia, Vancouver, Canada
Peter Carbonetto, Hendrik Kück & Nando de Freitas
INRIA Rhône-Alpes, Grenoble, France
Gyuri Dorkó & Cordelia Schmid

Authors

Peter Carbonetto
View author publications
You can also search for this author inPubMed Google Scholar
Gyuri Dorkó
View author publications
You can also search for this author inPubMed Google Scholar
Cordelia Schmid
View author publications
You can also search for this author inPubMed Google Scholar
Hendrik Kück
View author publications
You can also search for this author inPubMed Google Scholar
Nando de Freitas
View author publications
You can also search for this author inPubMed Google Scholar

Corresponding author

Correspondence to Peter Carbonetto.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Carbonetto, P., Dorkó, G., Schmid, C. et al. Learning to Recognize Objects with Little Supervision. Int J Comput Vis 77, 219–237 (2008). https://doi.org/10.1007/s11263-007-0067-7

Download citation

Received: 29 August 2005
Accepted: 29 May 2007
Published: 21 July 2007
Issue Date: May 2008
DOI: https://doi.org/10.1007/s11263-007-0067-7

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Learning to Recognize Objects with Little Supervision

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Adaptive Information Selection in Images: Efficient Naive Bayes Nearest Neighbor Classification

Learning Prior Bias in Classifier

Location-Aware Image Classification

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Learning to Recognize Objects with Little Supervision

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Adaptive Information Selection in Images: Efficient Naive Bayes Nearest Neighbor Classification

Learning Prior Bias in Classifier

Location-Aware Image Classification

Explore related subjects

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now