Skip to main content
Log in

Efficient object detection using convolutional neural network-based hierarchical feature modeling

  • Original Paper
  • Published:
Signal, Image and Video Processing Aims and scope Submit manuscript

Abstract

A hierarchical data-driven object detection framework is addressed considering a deep feature hierarchy of object appearances. The performance of many object detectors is degraded due to ambiguities in inter-class appearances and variations in intra-class appearances, but deep features extracted from visual objects show a strong hierarchical clustering property. Deep features were partitioned into unsupervised super-categories at the inter-class level, and augmented categories at the object level, to discover deep feature-driven information. A hierarchical feature model is built using a latent topic model algorithm, assembling a one-versus-all support vector machine at each node to constitute a hierarchical classification ensemble. Extensive experiments show that the proposed method is superior to state-of-the-art techniques using the PASCAL VOC 2007 and VOC 2012 datasets.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2

Similar content being viewed by others

References

  1. Dong, J., Chen, Q., Feng, J., Jia, K., Huang, Z., Yan, S.: Looking inside category: subcategory-aware object recognition. IEEE Trans. Circuits Syst. Video Technol. 25(8), 1322–1334 (2015)

    Article  Google Scholar 

  2. Felzenszwalb, P.F., Girshick, R.B., McAllester, D., Ramanan, D.: Object detection with discriminatively trained part-based models. IEEE Trans. Pattern Anal. Mach. Intell. 32(9), 1627–1645 (2010)

    Article  Google Scholar 

  3. Song, Z., Chen, Q., Huang, Z., Hua, Y., Yan, S.: Contextualizing object detection and classification. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1585–1592 (2011)

  4. Cinaroglu, I., Bastanlar, Y.: A direct approach for object detection with catadioptric omnidirectional cameras. Signal Image Video Process. 10(2), 413–420 (2016)

    Article  Google Scholar 

  5. Fusek, R., Sojka, E.: Energy transfer features combined with DCT for object detection. Signal Image Video Process. 10(3), 479–486 (2016)

    Article  Google Scholar 

  6. Takarli, F., Aghagolzadeh, A., Seyedarabi, H.: Combination of high-level features with low-level features for detection of pedestrian. Signal Image Video Process. 10(1), 93–101 (2016)

    Article  Google Scholar 

  7. Park, D., Ramanan, D., Fowlkes, C.: Multiresolution models for object detection. In: Proceedings of the IEEE European Conference Computer Vision, pp. 241–254 (2010)

  8. Gu, C., Ren, X.: Discriminative mixture-of-templates for viewpoint classification. In: Proceedings of the IEEE European Conference Computer Vision, pp. 408-421 (2010)

  9. Bourdev, L., Maji, S., Brox, T., Malik, J.: Detecting people using mutually consistent poselet activations. In: Proceedings of the IEEE European Conference Computer Vision, pp. 168–181 (2010)

  10. Malisiewicz, T., Gupta, A., Efros, A. A.: Ensemble of exemplar-svms for object detection and beyond. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 89–96 (2011)

  11. Gu, C., Arbelez, P., Lin, Y., Yu, K., Malik, J.: Multi-component models for object detection. In: Proceedings of the IEEE European Conference Computer Vision, pp. 445–458 (2012)

  12. Divvala, S.K., Efros, A.A., Hebert, M.: How important are Deformable Parts in the Deformable Parts Model? In: Proceedings of the IEEE European Conference Computer Vision, Workshops and Demonstrations, pp. 31–40 (2012)

  13. Zhu, X., Vondrick, C., Ramanan, D., Fowlkes, C.: Do We Need More Training Data or Better Models for Object Detection?. In: BMVC, vol. 3, p. 5 (2012)

  14. Aghazadeh, O., Azizpour, H., Sullivan, J., Carlsson, S.: Mixture component identification and learning for visual recognition. In: Proceedings of the IEEE European Conference Computer Vision, pp. 115–128 (2012)

  15. Ruan, Z., Wang, G., Xue, J.H., Lin, X.: Subcategory clustering with latent feature alignment and filtering for object detection. Signal Process. Lett. IEEE 22(2), 244–248 (2015)

    Article  Google Scholar 

  16. Ding, K., Huo, C., Xu, Y., Zhong, Z., Pan, C.: Sparse hierarchical clustering for VHR image change detection. Geosci. Remote Sens. Lett. IEEE 12(3), 577–581 (2015)

    Article  Google Scholar 

  17. Yu, X., Yang, J., Lin, Z., Wang, J., Wang, T., Huang, T.: Subcategory-aware object detection. Signal Process. Lett. IEEE 22(9), 1472–1476 (2015)

    Article  Google Scholar 

  18. Zitnick, C. L., Dollr, P.: Edge boxes: locating object proposals from edges. In: Proceedings of the IEEE European Conference Computer Vision, pp. 391–405 (2014)

  19. He, K., Zhang, X., Ren, S., Sun, J.: Spatial pyramid pooling in deep convolutional networks for visual recognition. IEEE Trans. Pattern Anal. Mach. Intell. 37(9), 1904–1916 (2015)

    Article  Google Scholar 

  20. Girshick, R.: Fast r-cnn. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 1440–1448 (2015)

  21. Goh, K.S., Chang, E.Y., Li, B.: Using one-class and two-class SVMs for multiclass image annotation. IEEE Trans. Knowl. Data Eng. 17(10), 1333–1346 (2005)

    Article  Google Scholar 

  22. Wang, L., Qiao, Y., Tang, X.: Latent hierarchical model of temporal structure for complex activity classification. IEEE Trans. Image Process. 23(2), 810–822 (2014)

    Article  MathSciNet  Google Scholar 

  23. Girshick, R., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: Proceedings of the IEEE conference on Computer Vision and Pattern Recognition, pp. 580–587 (2014)

  24. Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent dirichlet allocation. J. Mach. Learn. Res. 3, 993–1022 (2003)

    MATH  Google Scholar 

  25. Cheng, D., Wang, J., Wei, X., Gong, Y.: Training mixture of weighted SVM for object detection using EM algorithm. Neurocomputing 149, 473–482 (2015)

    Article  Google Scholar 

  26. Platt, J.: Probabilistic outputs for support vector machines and comparisons to regularized likelihood methods. Adv. Large Margin Classif. 10(3), 61–74 (1999)

    Google Scholar 

  27. Everingham, M., Van Gool, L., Williams, C. K. I., Winn, J., Zisserman, A.: The PASCAL visual object classes challenge 2012 (2012)

  28. Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014)

  29. Gidaris, S., Komodakis, N.: LocNet: Improving Localization Accuracy for Object Detection. arXiv preprint arXiv:1511.07763 (2015)

  30. Gidaris, S., Komodakis, N.: Object detection via a multi-region and semantic segmentation-aware cnn model. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 1134–1142 (2015)

  31. Kong, T., Yao, A., Chen, Y., Sun, F.: HyperNet: Towards Accurate Region Proposal Generation and Joint Object Detection. arXiv preprint arXiv:1604.00600 (2016)

  32. Hoiem, D., Chodpathumwan, Y., Dai, Q.: Diagnosing error in object detectors. In: Proceedings of the IEEE European Conference Computer Vision, pp. 340–353 (2012)

  33. Lin, T. Y., Maire, M., Belongie, S., Hays, J., Perona, P., Ramanan, D., Zitnick, C. L.: Microsoft coco: common objects in context. In: Proceedings of the IEEE European Conference Computer Vision, pp. 740–755 (2014)

Download references

Acknowledgments

This work was supported by an Inha University research grant. A GPU used in this research was generously donated by NVIDIA Corporation.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Phill Kyu Rhee.

Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (docx 4923 KB)

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Lee, B., Erdenee, E., Jin, S. et al. Efficient object detection using convolutional neural network-based hierarchical feature modeling. SIViP 10, 1503–1510 (2016). https://doi.org/10.1007/s11760-016-0962-x

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11760-016-0962-x

Keywords

Navigation