Skip to main content
Log in

Visual Object Detection Using Cascades of Binary and One-Class Classifiers

  • Published:
International Journal of Computer Vision Aims and scope Submit manuscript

Abstract

We describe an efficient approach to visual object detection that uses short cascades of asymmetric ‘one class’ classifiers to quickly reject negatives (windows not centered on an object of the desired class) within a sliding window framework. Current detectors typically use binary discriminants such as Support Vector Machines or Boosting to implement each stage of the cascade. These treat the positive and negative classes symmetrically. We argue that this is suboptimal because object detectors typically see a great many negative windows with extremely diverse contents and only a few positive ones with comparatively coherent contents. We show that asymmetric representations that focus on tightly modeling the extent of the rare, coherent positive class can lead to simpler classifiers and faster rejection. Our cascades use asymmetric classifiers based on simple convex models to progressively tighten the bound on the positive class. They typically start with a conventional linear SVM for initial pruning, followed by a cascade of linear distance-to-hyperplane and interior-of-hypersphere classifiers and finishing with a kernelized hypersphere classifier. We show that the resulting detectors have competitive performance on the Labeled Faces in the Wild dataset and state-of-the-art performance on the FDDB face detection, ESOGU face detection and INRIA Person datasets. The results on the PASCAL VOC 2007 dataset are also respectable given that they use neither object parts nor context. The one-class formulations provide significant reductions in classifier complexity relative to the corresponding two-class ones, making them suitable for real-world applications.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

Similar content being viewed by others

Notes

  1. The name “one class” is conventional. It emphasizes the origin of these methods in density modeling and the predominant role of the positive class but it is something of a misnomer in that negative examples usually can be, often are, and in some formulations must be included during training.

  2. http://cmp.felk.cvut.cz/~xfrancv/ocas/html/index.html.

  3. http://www.csie.ntu.edu.tw/~cjlin/libsvm.

  4. It would be possible to learn \(\varDelta \) by including a \((\text {weight})\cdot \,\varDelta \) term in the cost function but we have not done this here owing to a limitation of the QP solver that we used. Instead we set \(\varDelta \) directly using cross validation. (Cross validation might be needed in any case, to set the weight).

  5. http://cmp.felk.cvut.cz.

  6. The code is available from http://mlcv.ogu.edu.tr/softwares.html.

  7. http://mlcvdb.ogu.edu.tr/facedetection.html.

  8. http://picasa.google.com.

  9. Typically only a few hundred—about a thousand per class partitioned among 3 pairs of roots.

  10. http://pascallin.ecs.soton.ac.uk/challenges/VOC/voc2007/results/index.shtml.

  11. http://www.cs.berkeley.edu/~rbg/latent/index.html.

References

  • Ahonen, T., Hadid, A., & Pietikainen, M. (2006). Face description with local binary patterns: Application to face recognition. IEEE T-PAMI, 28(12), 2037–2041.

    Article  MATH  Google Scholar 

  • Aldavert, D., Ramisa, A., Mantaras, R. L., & Toledo, R. (2010). Fast and robust object segmentation with the integral linear classifier. In CVPR.

  • Amit, Y., & Geman, D. (1999). A computational model for visual selection. Neural Computation, 11, 1691–1715.

    Article  Google Scholar 

  • Angelova, A., Krizhevsky, A., Vanhoucke, V., Ogale, A., & Ferguson, D.(2015). Real-time pedestrian detection with deep network cascades. In BMVC.

  • Bay, H., Ess, A., Tuytelaars, T., & Van Gool, L. (2008). Surf: Speeded up robust features. CVIU, 110(3), 346–359.

    Google Scholar 

  • Belongie, S., Malik, J., & Puzicha, J. (2002). Shape matching and object recognition using shape contexts. IEEE T-PAMI, 24(24), 509–521.

    Article  Google Scholar 

  • Benenson, R., Mathias, M., Timofte, R., & Van Gool, L. (2010). Pedestrian detection at 100 frames per second. In CVPR.

  • Burges, C. J. C. (1996). Simplified support vector decisions. In International conference on machine learning.

  • Cevikalp, H., & Triggs, B. (2008). Nearest hyperdisk methods for high-dimensional classification. In International conference on machine learning.

  • Cevikalp, H., & Triggs, B. (2012). Efficient object detection using cascades of nearest convex model classifiers. In CVPR.

  • Cevikalp, H., Triggs, B., & Franc, V. (2013). Face and landmark detection by using cascade of classifiers. In IEEE International conference on automatic face and gesture recognition.

  • Cevikalp, H., Larlus, D., Neamtu, M., Triggs, B., & Jurie, F. (2010). Manifold based local classifiers: Linear and nonlinear approaches. Journal of Signal Processing Systems, 61, 61–73.

    Article  Google Scholar 

  • Cortes, C., & Vapnik, V. (1995). Support vector networks. Machine Learning, 20, 273–297.

    MATH  Google Scholar 

  • Dalal, N., & Triggs, B. (2005). Histograms of oriented gradients for human detection. In CVPR.

  • Felzenszwalb, P. F., & Girshick, R. B., & McAllester, D. (2010a). Cascade object detection with deformable part models. In CVPR.

  • Felzenszwalb, P., McAllester, D., & Ramanan, D. (2008). A discriminatively trained, multiscale deformable part model. In CVPR.

  • Felzenszwalb, P., Girshick, R. B., McAllester, D., & Ramanan, D. (2010b). Object detection with discriminatively trained part based models. IEEE T-PAMI, 32(9), 1627–1645.

  • Gasimov, R. N., & Ozturk, G. (2006). Separation via polyhedral conic functions. Optimization Methods and Software, 21, 527–540.

    Article  MathSciNet  MATH  Google Scholar 

  • Girshick, R., Donahue, J., Darrell, T., & Malik, J. (2014). Rich feature hierarchies for accurate object detection and semantic segmentation. In CVPR4.

  • Harzallah, H., Jurie, F., & Schmid, C.(2009). Combining efficient object localization and image classification. In ICCV.

  • Huang, G., Ramesh, M., Berg, T., & Learned-Miller, E. (2007). Labeled faces in the wild: A database for studying face recognition in unconstrained environments. Technical Report 07-49, University of Massachusetts, Amherst, October.

  • Hussain, S. (2011). Machine learning methods for visual object detection. PhD thesis, Laboratoire Jean Kuntzmann.

  • Hussain, S., & Triggs, B.(2010). Feature sets and dimensionality reduction for visual object detection. In BMVC.

  • Jain, V., & Learned-Miller, E., (2010). Fddb: A benchmark for face detection in unconstrained settings. Technical Report UM-CS-2010-009, University of Massachusetts, Amherst.

  • Jin, H., Liu, Q., & Lu, H. (2004). Face detection using one-class-based support vectors. In International conference on automatic face and gesture recognition.

  • Kalal, Z., Matas, J., & Mikolajczyk, K. (2008). Weighted sampling for large-scale boosting. In BMVC.

  • Lampert, C. H., Blaschko, M. B., & Hofmann, T. (2008). Beyond sliding windows: Object localization by efficient subwindow search. In CVPR.

  • Levi, K., & Weiss, Y. (2004). Learning object detection from a small number of examples: The importance of good features. In CVPR.

  • Li, H., Lin, Z., Shen, X., Brandt, J., & Hua, G.(2015). A convoulutional neural network cascade for face detection. In CVPR.

  • Lowe, D. G. (2004). Distinctive image features from scale invariant keypoints. IJCV, 60, 91–110.

    Article  Google Scholar 

  • Malisiewicz, T., Gupta, A., & Efros, A. A. (2011). Ensemble of exemplar-SVTS for object detection and beyond. In ICCV.

  • Mangasarian, O. L., & Wild, E. W. (2006). Multisurface proximal support vector machine classification via generalized eigenvalues. IEEE T-PAMI, 28, 69–74.

    Article  Google Scholar 

  • Mika, S., Schölkopf, B., Smola, A., Müller, K.-R., Scholz, M., & Ratsch, G. (1999). Kernel PCA and de-noising in feature spaces. In Neural information processing systems (NIPS).

  • Murat Dundar, M., Wolf, M., Lakare, S., Salganicoff, M., & Raykar, V. C. (2008). Polyhedral clasifier for target detection a case study: Colorectal cancer. In International conference on machine learning.

  • Murty, S. K., Kasif, S., & Salzberg, S. (1994). A system for induction of oblique decision trees. Journal of Artificial Intelligence Research, 2, 1–32.

    MATH  Google Scholar 

  • Orozco, J., Martinez, B., & Pantic, M. (2015). Empirical analysis of cascade deformable models for multi-view face detection. Image and Vision Computing, 42, 47–61.

    Article  Google Scholar 

  • Papageorgiou, C., & Poggio, T. (2000). A trainable system for object detection. IJCV, 38, 15–33.

    Article  MATH  Google Scholar 

  • Perrotton, X., Sturzel, M., & Roux, M. (2010). Implicit hierarchical boosting for multi-view object detection. In CVPR.

  • Platt, J. C., (1998). Fast training of support vector machines using sequential minimal optimization. Advances in kernel methods-support vector learning. Cambridge, MA: MIT Press.

  • Porikli, F. (2005). Integral histogram: A fast way to extract histograms in Cartesian spaces. In CVPR.

  • Rowley, H. A., Baluja, S., & Kanade, T. (1998). Neural network-based face detection. IEEE T-PAMI, 20, 23–38.

    Article  Google Scholar 

  • Scheirer, W. J., Rocha, A., Sapkota, A., & Boult, T. E. (2013). Towards open set recognition. IEEE Transactions on PAMI, 35, 1757–1772.

  • Schölkopf, B., Mika, S., Burges, C. J. C., Knirsch, P., Müller, K. R., Ratsch, G., et al. (1999). Input space versus feature space in kernel-based methods. IEEE Transactions on Neural Networks, 10, 1000–1017.

    Article  Google Scholar 

  • Schölkopf, B., Platt, J., Smola, A., & Williamson, R. (2001). Estimating the support of a high-dimensional distribution. Neural Computation, 13, 1443–1471.

    Article  MATH  Google Scholar 

  • Shams, L., & Speslstra, J. (1996). Learning Gabor-based features for face detection. In World congress in neural networks.

  • Sizintsev, M., Derpanis, K. G., & Hogue, A. (2010). Histogram-based search: A comparative study. In CVPR.

  • Tan, X., & Triggs, B. (2010). Enhanced local texture feature sets for face recognition under difficult lighting conditions. IEEE Transactions on Image Processing, 19, 1635–1650.

    Article  MathSciNet  Google Scholar 

  • Tax, D. M. J., & Duin, R. P. W. (2004). Support vector data description. Machine Learning, 54, 45–66.

    Article  MATH  Google Scholar 

  • Tenmoto, H., Kudo, M., & Shimbo, M. (1998). Piecewise linear classifiers with an appropriate number of hyperplanes. Pattern Recognition, 31, 1627–1634.

    Article  MATH  Google Scholar 

  • Ullman, S., & Sali, E. (2000). Object classification using a fragment-based representation. In Proceedings of the first IEEE international workshop on biologically motivated computer vision, BMVC ’00, pp. 73–87. London. Springer.

  • Varma, M., & Ray, D. (2007). Learning the discriminative power-invariance trade-off. In ICCV.

  • Vedaldi, A., Gulshan, V., Varma, M., & Zisserman, A. (2009). Multiple kernels for object detection. In ICCV.

  • Vedaldi, A., & Zisserman, A. (2012). Efficient additive kernels via explicit feature maps. IEEE Transactions on PAMI, 34, 480–492.

    Article  Google Scholar 

  • Viola, P., & Jones, M. J. (2004). Robust real-time face detection. IJCV, 57(2), 137–154.

    Article  Google Scholar 

  • Wang, X., Han, T. X., & Yan, S. (2009). A HOG-LBP human detector with partial occlusion handling. In ICCV.

  • Wei, Y., & Tao, L. (2010). Efficient histogram-based sliding window. In CVPR.

  • Zhu, X., & Ramanan, D. (2012). Face detection, pose estimation, and landmark localization in the wild. In CVPR.

  • Zhu, L., Chen, Y., Yuille, A., & Freeman, W. (2010). Latent hierarchical structural learning for object detection. In CVPR.

  • Zhu, X., Vondrick, C., Ramanan, D., & Fowlkes, C. C. (2012). Do we need more training data or better models for object detection. In BMVC.

Download references

Acknowledgements

This work was supported in part by the Scientific and Technological Research Council of Turkey (TUBİTAK) under Grant Number EEEAG-109E279.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Hakan Cevikalp.

Additional information

Communicated by Takayuki Okatani.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Cevikalp, H., Triggs, B. Visual Object Detection Using Cascades of Binary and One-Class Classifiers. Int J Comput Vis 123, 334–349 (2017). https://doi.org/10.1007/s11263-016-0986-2

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11263-016-0986-2

Keywords

Navigation