Hybrid Generative-Discriminative Visual Categorization

Holub, Alex D.; Welling, Max; Perona, Pietro

doi:10.1007/s11263-007-0084-6

Hybrid Generative-Discriminative Visual Categorization

Published: 18 October 2007

Volume 77, pages 239–258, (2008)
Cite this article

International Journal of Computer Vision Aims and scope Submit manuscript

Alex D. Holub¹,
Max Welling² &
Pietro Perona¹

197 Accesses
29 Citations
Explore all metrics

Abstract

Learning models for detecting and classifying object categories is a challenging problem in machine vision. While discriminative approaches to learning and classification have, in principle, superior performance, generative approaches provide many useful features, one of which is the ability to naturally establish explicit correspondence between model components and scene features—this, in turn, allows for the handling of missing data and unsupervised learning in clutter. We explore a hybrid generative/discriminative approach, using ‘Fisher Kernels’ (Jaakola, T., et al. in Advances in neural information processing systems, Vol. 11, pp. 487–493, 1999), which retains most of the desirable properties of generative methods, while increasing the classification performance through a discriminative setting. Our experiments, conducted on a number of popular benchmarks, show strong performance improvements over the corresponding generative approach. In addition, we demonstrate how this hybrid learning paradigm can be extended to address several outstanding challenges within computer vision including how to combine multiple object models and learning with unlabeled data.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

References

Burl, M., & Perona, P. (1996). Recognition of planar object classes. In Computer vision and pattern recognition (CVPR) (p. 223).
Chapelle, O., Vapnik, V., Bousquet, O., & Mukherjee, S. (2002). Choosing multiple parameters for support vector machines. Machine Learning, 46, 131–159.
Article MATH Google Scholar
Crowley, J. L. (1984). A representation for shape based on peaks and ridges in the difference of low pass transform. In Pattern recognition and machine intelligence (PAMI).
Dempster, A., Laird, N., & Rubin, D. (1977). Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society: Series B, 39, 1–38.
MATH MathSciNet Google Scholar
Dorko, G., & Schmid, C. (2005). Object class recognition using discriminative local features (Technical Report RR-5497). INRIA.
Fei-Fei, L., Fergus, R., & Perona, P. (2004). Learning generative visual models from few training examples. In Computer vision and pattern recognition (CVPR) workshop on GMBV.
Fergus, R. (2005). Visual object recognition. Thesis, Department of Engineering Science, University of Oxford.
Fergus, R., Perona, P., & Zisserman, A. (2003). Object class recognition by unsupervised scale-invariant learning. In Computer vision and pattern recognition (CVPR) (Vol. 2, p. 264).
Gold, C., Holub, A., & Sollich, P. (2005). Bayesian approach to feature selection and parameter tuning for support vector machine classifiers. In Neural Networks.
Holub, A., Welling, M., & Perona, P. (2005). Combining generative models and fisher kernels for object class recognition. In International conference on computer vision (ICCV).
Holub, A., & Perona, P. (2005). A discriminative framework for modeling object class. In Computer vision and pattern recognition (CVPR).
Jaakkola, T., Diekhans, M., & Haussler, D. (1999). Exploiting generative models in discriminative classifiers. In Advances in neural information processing systems (NIPS) (Vol. 11, pp. 487–493).
Jaakkola, T., & Haussler, D. (1999). Probabilistic kernel regression models. In Proceedings of the seventh international workshop on artificial intelligence and statistics.
Kadir, T., & Brady, M. (2001). Saliency, scale and image description. International Journal of Computer Vision, 45(2), 83–105.
Article MATH Google Scholar
Leibe, B., & Schiele, B. (2004). Scale-invariant object categorization using a scale-adaptive mean-shift search. In DAGM-symposium (pp. 145–153).
Lowe, D. (2004). Distinctive image features from scale-invariant keypoints. International Journal of Computer Vision, 60, 91–110.
Article Google Scholar
Ng, A., & Jordan, M. (2002). On discriminative vs. generative classifiers: a comparison of logistic regression and naive Bayes. In Advances in neural information processing systems (NIPS) (Vol. 12).
Opelt, A., Fussenegger, M., Pinz, A., & Auer, P. (2004). Weak hypotheses and boosting for generic object detection and recognition. In European conference on computer vision (ECCV) (pp. 71–84).
Opper, M., & Winther, O. (2000). Gaussian processes and svm: Mean field and leave-one-out. In Advances in large margin classifiers (pp. 311–326). Cambridge: MIT Press.
Google Scholar
Schneiderman, H. (2004). Learning a restricted Bayesian network for object detection. In Computer vision and pattern recognition (CVPR) (pp. 639–646).
Schoelkopf, B., & Smola, A. (2002). Learning with kernels. Cambridge: MIT Press.
Google Scholar
Seeger, M. (2002). Covariance kernels from Bayesian generative models. In Advances in neural information processing systems (NIPS) (Vol. 14, pp. 905–912).
Shawe-Taylor, J., & Cristianini, N. (2004). Kernel methods for pattern analysis. Cambridge: Cambridge University Press.
Google Scholar
Torralba, A., Murphy, K. P., & Freeman, W. T. (2004). Sharing visual features for multiclass and multiview object detection. In Computer vision and pattern recognition (CVPR).
Tsuda, K., Akaho, S., Kawanabe, M., & Müller, K.-R. (2003). Asymptotic properties of the fisher kernel. citeseer.ist.psu.edu/tsuda03asymptotic.html.
Ullman, S., Vidal-Naquet, M., & Sali, E. (2002). Visual features of intermediate complexity and their use in classification. In Nature neuroscience (pp. 682–687).
Vapnik, V. (1998). Statistical learning theory. New York: Wiley–Interscience.
MATH Google Scholar
Vasconcelos, N., Ho, P., & Moreno, P. (2004). The Kullback–Leibler kernel as a framework for discriminant and localized representations for visual recognition. In European conference on computer vision (ECCV) (pp. 430–441).
Wallraven, C., Caputo, B., & Graf, A. B. A. (2003). Recognition with local features: the kernel recipe. In International conference on computer vision (ICCV) (pp. 257–264).
Weber, M., Welling, M., & Perona, P. (2000). Towards automatic discovery of object categories. In Computer vision and pattern recognition (CVPR) (p. 2101).

Download references

Author information

Authors and Affiliations

Computation and Neural Systems, California Institute of Technology, MC 136-93, Pasadena, CA, 91125, USA
Alex D. Holub & Pietro Perona
Department of Computer Science, University of California Irvine, Irvine, CA, 92697-3425, USA
Max Welling

Authors

Alex D. Holub
View author publications
You can also search for this author in PubMed Google Scholar
Max Welling
View author publications
You can also search for this author in PubMed Google Scholar
Pietro Perona
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Alex D. Holub.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Holub, A.D., Welling, M. & Perona, P. Hybrid Generative-Discriminative Visual Categorization. Int J Comput Vis 77, 239–258 (2008). https://doi.org/10.1007/s11263-007-0084-6

Download citation

Received: 14 September 2005
Accepted: 15 August 2007
Published: 18 October 2007
Issue Date: May 2008
DOI: https://doi.org/10.1007/s11263-007-0084-6

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Hybrid Generative-Discriminative Visual Categorization

Abstract

Access this article

Similar content being viewed by others

Improving Generalization via Scalable Neighborhood Component Analysis

A self-supervised domain-general learning framework for human ventral stream representation

Deep Residual Network Predicts Cortical Representation and Organization of Visual Features for Rapid Categorization

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Hybrid Generative-Discriminative Visual Categorization

Abstract

Access this article

Similar content being viewed by others

Improving Generalization via Scalable Neighborhood Component Analysis

A self-supervised domain-general learning framework for human ventral stream representation

Deep Residual Network Predicts Cortical Representation and Organization of Visual Features for Rapid Categorization

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation