Skip to main content
Log in

Hybrid Generative-Discriminative Visual Categorization

  • Published:
International Journal of Computer Vision Aims and scope Submit manuscript

Abstract

Learning models for detecting and classifying object categories is a challenging problem in machine vision. While discriminative approaches to learning and classification have, in principle, superior performance, generative approaches provide many useful features, one of which is the ability to naturally establish explicit correspondence between model components and scene features—this, in turn, allows for the handling of missing data and unsupervised learning in clutter. We explore a hybrid generative/discriminative approach, using ‘Fisher Kernels’ (Jaakola, T., et al. in Advances in neural information processing systems, Vol. 11, pp. 487–493, 1999), which retains most of the desirable properties of generative methods, while increasing the classification performance through a discriminative setting. Our experiments, conducted on a number of popular benchmarks, show strong performance improvements over the corresponding generative approach. In addition, we demonstrate how this hybrid learning paradigm can be extended to address several outstanding challenges within computer vision including how to combine multiple object models and learning with unlabeled data.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  • Burl, M., & Perona, P. (1996). Recognition of planar object classes. In Computer vision and pattern recognition (CVPR) (p. 223).

  • Chapelle, O., Vapnik, V., Bousquet, O., & Mukherjee, S. (2002). Choosing multiple parameters for support vector machines. Machine Learning, 46, 131–159.

    Article  MATH  Google Scholar 

  • Crowley, J. L. (1984). A representation for shape based on peaks and ridges in the difference of low pass transform. In Pattern recognition and machine intelligence (PAMI).

  • Dempster, A., Laird, N., & Rubin, D. (1977). Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society: Series B, 39, 1–38.

    MATH  MathSciNet  Google Scholar 

  • Dorko, G., & Schmid, C. (2005). Object class recognition using discriminative local features (Technical Report RR-5497). INRIA.

  • Fei-Fei, L., Fergus, R., & Perona, P. (2004). Learning generative visual models from few training examples. In Computer vision and pattern recognition (CVPR) workshop on GMBV.

  • Fergus, R. (2005). Visual object recognition. Thesis, Department of Engineering Science, University of Oxford.

  • Fergus, R., Perona, P., & Zisserman, A. (2003). Object class recognition by unsupervised scale-invariant learning. In Computer vision and pattern recognition (CVPR) (Vol. 2, p. 264).

  • Gold, C., Holub, A., & Sollich, P. (2005). Bayesian approach to feature selection and parameter tuning for support vector machine classifiers. In Neural Networks.

  • Holub, A., Welling, M., & Perona, P. (2005). Combining generative models and fisher kernels for object class recognition. In International conference on computer vision (ICCV).

  • Holub, A., & Perona, P. (2005). A discriminative framework for modeling object class. In Computer vision and pattern recognition (CVPR).

  • Jaakkola, T., Diekhans, M., & Haussler, D. (1999). Exploiting generative models in discriminative classifiers. In Advances in neural information processing systems (NIPS) (Vol. 11, pp. 487–493).

  • Jaakkola, T., & Haussler, D. (1999). Probabilistic kernel regression models. In Proceedings of the seventh international workshop on artificial intelligence and statistics.

  • Kadir, T., & Brady, M. (2001). Saliency, scale and image description. International Journal of Computer Vision, 45(2), 83–105.

    Article  MATH  Google Scholar 

  • Leibe, B., & Schiele, B. (2004). Scale-invariant object categorization using a scale-adaptive mean-shift search. In DAGM-symposium (pp. 145–153).

  • Lowe, D. (2004). Distinctive image features from scale-invariant keypoints. International Journal of Computer Vision, 60, 91–110.

    Article  Google Scholar 

  • Ng, A., & Jordan, M. (2002). On discriminative vs. generative classifiers: a comparison of logistic regression and naive Bayes. In Advances in neural information processing systems (NIPS) (Vol. 12).

  • Opelt, A., Fussenegger, M., Pinz, A., & Auer, P. (2004). Weak hypotheses and boosting for generic object detection and recognition. In European conference on computer vision (ECCV) (pp. 71–84).

  • Opper, M., & Winther, O. (2000). Gaussian processes and svm: Mean field and leave-one-out. In Advances in large margin classifiers (pp. 311–326). Cambridge: MIT Press.

    Google Scholar 

  • Schneiderman, H. (2004). Learning a restricted Bayesian network for object detection. In Computer vision and pattern recognition (CVPR) (pp. 639–646).

  • Schoelkopf, B., & Smola, A. (2002). Learning with kernels. Cambridge: MIT Press.

    Google Scholar 

  • Seeger, M. (2002). Covariance kernels from Bayesian generative models. In Advances in neural information processing systems (NIPS) (Vol. 14, pp. 905–912).

  • Shawe-Taylor, J., & Cristianini, N. (2004). Kernel methods for pattern analysis. Cambridge: Cambridge University Press.

    Google Scholar 

  • Torralba, A., Murphy, K. P., & Freeman, W. T. (2004). Sharing visual features for multiclass and multiview object detection. In Computer vision and pattern recognition (CVPR).

  • Tsuda, K., Akaho, S., Kawanabe, M., & Müller, K.-R. (2003). Asymptotic properties of the fisher kernel. citeseer.ist.psu.edu/tsuda03asymptotic.html.

  • Ullman, S., Vidal-Naquet, M., & Sali, E. (2002). Visual features of intermediate complexity and their use in classification. In Nature neuroscience (pp. 682–687).

  • Vapnik, V. (1998). Statistical learning theory. New York: Wiley–Interscience.

    MATH  Google Scholar 

  • Vasconcelos, N., Ho, P., & Moreno, P. (2004). The Kullback–Leibler kernel as a framework for discriminant and localized representations for visual recognition. In European conference on computer vision (ECCV) (pp. 430–441).

  • Wallraven, C., Caputo, B., & Graf, A. B. A. (2003). Recognition with local features: the kernel recipe. In International conference on computer vision (ICCV) (pp. 257–264).

  • Weber, M., Welling, M., & Perona, P. (2000). Towards automatic discovery of object categories. In Computer vision and pattern recognition (CVPR) (p. 2101).

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Alex D. Holub.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Holub, A.D., Welling, M. & Perona, P. Hybrid Generative-Discriminative Visual Categorization. Int J Comput Vis 77, 239–258 (2008). https://doi.org/10.1007/s11263-007-0084-6

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11263-007-0084-6

Keywords

Navigation