Abstract
This paper shows (i) improvements over state-of-the-art local feature recognition systems, (ii) how to formulate principled models for automatic local feature selection in object class recognition when there is little supervised data, and (iii) how to formulate sensible spatial image context models using a conditional random field for integrating local features and segmentation cues (superpixels). By adopting sparse kernel methods, Bayesian learning techniques and data association with constraints, the proposed model identifies the most relevant sets of local features for recognizing object classes, achieves performance comparable to the fully supervised setting, and obtains excellent results for image classification.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
Agarwal, S., Awan, A., & Roth, D. (2004). Learning to detect objects in images via a sparse, part-based representation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 26, 1475–1490.
Andrews, S., Tsochantaridis, I., & Hofmann, T. (2002). Multiple instance learning with generalized support vector machines. In Proceedings of the 18th national conference on artificial intelligence (pp. 943–944).
Andrieu, C., de Freitas, N., Doucet, A., & Jordan, M. I. (2003). An introduction to MCMC for machine learning. Machine Learning, 50(1–2), 5–43.
Bernardo, J. M., & Smith, A. F. M. (2000). Bayesian theory. New York: Wiley.
Besag, J. (1974). Spatial interaction and the statistical analysis of lattice systems. Journal of the Royal Statistical Society, Series B, 36, 192–236.
Boyd, S., & Vandenberghe, L. (2004). Convex optimization. Cambridge: Cambridge University Press.
Carbonetto, P., de Freitas, N., Gustafson, P., & Thompson, N. (2003). Bayesian feature weighting for unsupervised learning, with application to object recognition. In Proceedings of the workshop on artificial intelligence and statistics.
Carbonetto, P., de Freitas, N., & Barnard, K. (2004a). A statistical model for general contextual object recognition. In Proceedings of the 8th European conference on computer vision (Vol. I, pp. 350–362).
Carbonetto, P., Dorko, G., Schmid, C., & de Freitas, N. (2004b). Bayesian learning for weakly supervised object classification. Technical report, INRIA Rhône-Alpes.
Celeux, G., Hurn, M., & Robert, C. P. (2000). Computational and inferential difficulties with mixture posterior distributions. Journal of the American Statistical Association, 95, 957–970.
Chib, S., & Greenberg, E. (1995). Understanding the Metropolis–Hastings algorithm. The American Statistician, 49(4), 327–335.
Csurka, G., Dance, C. R., Fan, L., Willamowski, J., & Bray, C. (2004). Visual categorization with bags of keypoints. In Proceedings of the ECCV international workshop on statistical learning in computer vision.
Deselaers, T., Keysers, D., & Ney, H. (2005). Discriminative training for object recognition using images patches. In Proceedings of the IEEE conference on computer vision and pattern recognition (Vol. 2, pp. 157–162).
Dietterich, T. G., Lathrop, R. H., & Lozano-Perez, T. (1997). Solving the multiple instance learning with axis-parallel rectangles. Artificial Intelligence, 89(1), 31–71.
Dorkó, G., & Schmid, C. (2003). Selection of scale invariant neighborhoods for object class recognition. In Proceedings of the 9th IEEE international conference on computer vision (Vol. I, pp. 634–640).
Duygulu, P., Barnard, K., de Freitas, N., & Forsyth, D. A. (2002). Object recognition as machine translation: learning a lexicon for a fixed image vocabulary. In Proceedings of the 7th European conference on computer vision (Vol. IV, pp. 97–112).
Everingham, M., Zisserman, A., Williams, C., & Gool, L. V. (2006). The PASCAL visual object classes challenge 2006 (VOC2006) results. Technical report.
Fergus, R., Perona, P., & Zisserman, A. (2003). Object class recognition by unsupervised scale-invariant learning. In Proceedings of the IEEE conference on computer vision and pattern recognition (Vol. II, pp. 264–271).
Hamze, F., & de Freitas, N. (2004). From fields to trees. In Proceedings of the 20th conference on uncertainty in artificial intelligence (pp. 243–250).
Kadir, T., & Brady, M. (2001). Scale, saliency and image description. International Journal of Computer Vision, 45(2), 83–105.
Kohn, R., Smith, M., & Chan, D. (2001). Nonparametric regression using linear combinations of basis functions. Statistics and Computing, 11, 313–322.
Kück, H., & de Freitas, N. (2005). Learning about individuals from group statistics. In Proceedings of the 21st conference on uncertainty in artificial intelligence (pp. 332–339).
Kück, H., Carbonetto, P., & de Freitas, N. (2004). A constrained semi-supervised learning approach to data association. In Proceedings of the 8th European conference on computer vision (Vol. III, pp. 1–12).
Kumar, S., & Hebert, M. (2006). Discriminative random fields. International Journal of Computer Vision, 26, 179–201.
Lafferty, J., McCallum, A., & Pereira, F. (2001). Conditional random fields: probabilistic models for segmenting and labeling sequence data. In Proceedings of the 18th international conference on machine learning.
Lazebnik, S., Schmid, C., & Ponce, J. (2005). A sparse texture representation using local affine regions. IEEE Transactions on Pattern Analysis and Machine Intelligence.
Leibe, B., Seemann, E., & Schiele, B. (2005). Pedestrian detection in crowded scenes. In Proceedings of the IEEE conference on computer vision and pattern recognition (Vol. I, pp. 878–885).
Lindeberg, T. (1998). Feature detection with automatic scale selection. International Journal of Computer Vision, 30(2), 79–116.
Liu, J. S., & Wu, Y. N. (1999). Parameter expansion for data augmentation. Journal of the American Statistical Association, 94(448), 1264–1274.
Liu, J. S., Wong, W. H., & Kong, A. (1994). Covariance structure of the Gibbs sampler with applications to the comparisons of estimators and augmentation schemes. Biometrika, 81(1), 27–40.
Lowe, D. G. (2004). Distinctive image features from scale-invariant keypoints. International Journal of Computer Vision, 60(2), 91–110.
Marsden, J. E., & Tromba, A. J. (1999). Vector calculus (4th ed.). New York: Freeman.
McFadden, D. (1989). A method of simulated moments for estimation of discrete response models without numerical integration. Econometrica, 57, 995–1026.
Mikolajczyk, K., & Schmid, C. (2001). Indexing based on scale invariant interest points. In Proceedings of the 8th international conference on computer vision (Vol. I, pp. 525–531).
Mikolajczyk, K., & Schmid, C. (2003). A Performance evaluation of local descriptors. In Proceedings of the IEEE conference on computer vision and pattern recognition (Vol. II, pp. 257–263).
Mikolajczyk, K., Schmid, C., & Zisserman, A. (2004). Human detection based on a probabilistic assembly of robust part detectors. In Proceedings of the 8th European conference on computer vision (Vol. I, pp. 69–82).
Miller, T., Berg, A. C., Edwards, J., Maire, M., White, R., Teh, Y. W., Learned-Miller, E., & Forsyth, D. A. (2004). Names and faces in the news. In Proceedings of the IEEE conference on computer vision and pattern recognition (Vol. II, pp. 848–854).
Opelt, A., Fussenegger, M., Pinz, A., & Auer, P. (2004). Weak hypotheses and boosting for generic object detection and recognition. In Proceedings of the 8th European conference on computer vision (Vol. II, pp. 71–84).
Pasula, H., Marthi, B., Milch, B., Russell, S., & Shpitser, I. (2003). Identity uncertainty and citation matching. In Advances in neural information processing systems 15.
Quattoni, A., Collins, M., & Darrell, T. (2005). Conditional random fields for object recognition. In Advances in neural information processing systems 17 (pp. 1097–1104)
Ren, X., & Malik, J. (2003). Learning a classification model for segmentation. In Proceedings of the 9th IEEE international conference on computer vision (Vol. I, pp. 10–17).
Robert, C. P. (1994). The Bayesian choice. Berlin: Springer.
Robert, C. P. (1995). Simulation of truncated normal variables. Statistics and Computing, 5, 121–125.
Robert, C. P., & Casella, G. (2004). Monte Carlo statistical methods (2nd ed.). Berlin: Springer.
Serre, T., Wolf, L., & Poggio, T. (2005). Object recognition with features inspired by visual cortex. In Proceedings of the IEEE conference on computer vision and pattern recognition (Vol. II, pp. 994–1000).
Shi, J., & Malik, J. (1997). Normalized cuts and image segmentation. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 731–737).
Sivic, J., Russell, B. C., Efros, A. A., Zisserman, A., & Freeman, W. T. (2005). Discovering objects and their locations in images. In Proceedings of the IEEE conference on computer vision and pattern recognition (Vol. I, pp. 370–377).
Tham, S. (2002). Markov chain Monte Carlo for sparse Bayesian regression and classification. PhD thesis, University of Melbourne.
Tham, S. S., Doucet, A., & Kotagiri, R. (2002). Sparse Bayesian learning for regression and classification using Markov Chain Monte Carlo. In Proceedings of the 19th international conference on machine learning.
Tipping, M. E. (2001). Sparse Bayesian learning and the relevance vector machine. Journal of Machine Learning Research, 1, 211–244.
Torralba, A., Murphy, K. P., Freeman, W. T., & Rubin, M. A. (2003). Context-based vision system for place and object recognition. In Proceedings of the 9th IEEE international conference on computer vision (Vol. I, pp. 273–280).
Viola, P., & Jones, M. J. (2004). Robust real-time face detection. International Journal of Computer Vision, 57(2), 137–154.
Willamowski, J., Arregui, D., Csurka, G., Dance, C. R., & Fan, L. (2004). Categorizing nine visual classes using local appearance descriptors. In Proceedings of the CVPR workshop on learning for adaptable visual systems.
Winn, J., & Shotton, J. (2006). The layout consistent random field for recognizing and segmenting partially occluded objects. In Proceedings of the IEEE conference on computer vision and pattern recognition (Vol. 1, pp. 37–44).
Zellner, A. (1971). An introduction to Bayesian inference in econometrics. New York: Wiley.
Zhang, J., Marsałek, M., Lazebnik, S., & Schmid, C. (2006). Local features and kernels for classification of texture and object categories: a comprehensive study. International Journal of Computer Vision, 73(2), 213–238.
Zhu, X., Ghahramani, Z., & Lafferty, J. (2003). Semi-supervised learning using Gaussian fields and harmonic functions. In Proceedings of the 20th international conference on machine learning (pp. 912–919).
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Carbonetto, P., Dorkó, G., Schmid, C. et al. Learning to Recognize Objects with Little Supervision. Int J Comput Vis 77, 219–237 (2008). https://doi.org/10.1007/s11263-007-0067-7
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11263-007-0067-7