Abstract
We present a novel max-margin Hough transform with latent structure for joint object detection and pose estimation. Our method addresses the large appearance and shape variation of objects in multiple poses by integrating three key components: First, we propose a more robust appearance model by designing a patch dictionary with complementary features; In addition, we use a group of latent components to explicitly incorporate feature selection and pooling into the Hough-based object models; Furthermore, we adopt a multiple instance learning approach to handle the lack of correspondence among training instances with noisy bounding-box labels. We design a unified objective and an efficient approximate inference that alternates the search between object location and pose space. We demonstrate the efficacy of our approach by achieving the state-of-the-art performance on two detection and two joint estimation datasets.
Keywords
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Felzenszwalb, P., McAllester, D., Ramanan, D.: A discriminatively trained, multiscale, deformable part model. In: CVPR (2008)
Wan, L., Eigen, D., Fergus, R.: End-to-end integration of a convolution network, deformable parts model and non-maximum suppression. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 851–859 (2015)
Zia, M.Z., Stark, M., Schindler, K.: Explicit occlusion modeling for 3d object class representations. In: CVPR (2013)
Maji, S., Malik, J.: Object detection using a max-margin hough transform. In: CVPR (2009)
Yarlagadda, P., Monroy, A., Ommer, B.: Voting by grouping dependent parts. In: Daniilidis, K., Maragos, P., Paragios, N. (eds.) ECCV 2010, Part V. LNCS, vol. 6315, pp. 197–210. Springer, Heidelberg (2010)
Razavi, N., Gall, J., Kohli, P., Van Gool, L.: Latent hough transform for object detection. In: Fitzgibbon, A., Lazebnik, S., Perona, P., Sato, Y., Schmid, C. (eds.) ECCV 2012, Part III. LNCS, vol. 7574, pp. 312–325. Springer, Heidelberg (2012)
Gall, J., Yao, A., Razavi, N., Van Gool, L., Lempitsky, V.: Hough forests for object detection, tracking, and action recognition. TPAMI 33(11), 2188–2202 (2011)
Leibe, B., Leonardis, A., Schiele, B.: Robust object detection with interleaved categorization and segmentation. IJCV 77(1), 259–289 (2008)
Arie-Nachimson, M., Basri, R.: Constructing implicit 3d shape models for pose estimation. In: CVPR (2009)
Razavi, N., Gall, J., Van Gool, L.: Backprojection revisited: scalable multi-view object detection and similarity metrics for detections. In: Daniilidis, K., Maragos, P., Paragios, N. (eds.) ECCV 2010, Part I. LNCS, vol. 6311, pp. 620–633. Springer, Heidelberg (2010)
Dalal, N., Triggs, B.: Histograms of oriented gradients for human detection. In: CVPR (2005)
Razavian, A.S., Azizpour, H., Sullivan, J., Carlsson, S.: Cnn features off-the-shelf: an astounding baseline for recognition. In: 2014 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), pp. 512–519. IEEE (2014)
Dalal, N., Triggs, B.: Histograms of oriented gradients for human detection. In: CVPR, vol. 1, pp. 886–893. IEEE (2005)
Andrews, S., Tsochantaridis, I., Hofmann, T.: Support vector machines for multiple-instance learning. In: NIPS (2002)
Hejrati, M., Ramanan, D.: Analyzing 3d objects in cluttered images. In: NIPS (2012)
Ballard, D.: Generalizing the hough transform to detect arbitrary shapes. PR 13(2), 111–122 (1981)
Leibe, B., Leonardis, A., Schiele, B.: Combined object categorization and segmentation with an implicit shape model. In: Workshop on Statistical Learning in Computer Vision, ECCV (2004)
Leibe, B., Leonardis, A., Schiele, B.: Robust object detection with interleaved categorization and segmentation. IJCV 77(1–3), 259–289 (2008)
Ommer, B., Malik, J.: Multi-scale object detection by clustering lines. In: CVPR (2009)
Payet, N., Todorovic, S.: From contours to 3d object detection and pose estimation. In: ICCV (2011)
Glasner, D., Galun, M., Alpert, S., Basri, R., Shakhnarovich, G.: Viewpoint-aware object detection and pose estimation. In: ICCV (2011)
Lowe, D.: Distinctive image features from scale-invariant keypoints. IJCV 60(2), 91–110 (2004)
Bourdev, L., Malik, J.: Poselets: body part detectors trained using 3d human pose annotations. In: ICCV (2009)
Girshick, R., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: 2014 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 580–587. IEEE (2014)
Toshev, A., Szegedy, C.: Deeppose: human pose estimation via deep neural networks. In: 2014 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1653–1660. IEEE (2014)
Everingham, M., Van Gool, L., Williams, C., Winn, J., Zisserman, A.: The pascal visual object classes challenge 2007 (voc 2007) results (2007) 11 (2008)
Ozuysal, M., Lepetit, V., Fua, P.: Pose estimation for category specific multiview object localization. In: CVPR (2009)
Berg, A.C., Malik, J.: Geometric blur for template matching. In: CVPR (2001)
Zhang, Y., Chen, T.: Implicit shape kernel for discriminative learning of the hough transform detector. In: BMVC (2010)
Acknowledgement
This work is supported by National Natural Science Foundation of China (Project NO: 61503168).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2016 Springer International Publishing Switzerland
About this paper
Cite this paper
Li, H., He, X., Barnes, N., Wang, M. (2016). Learning Hough Transform with Latent Structures for Joint Object Detection and Pose Estimation. In: Tian, Q., Sebe, N., Qi, GJ., Huet, B., Hong, R., Liu, X. (eds) MultiMedia Modeling. MMM 2016. Lecture Notes in Computer Science(), vol 9517. Springer, Cham. https://doi.org/10.1007/978-3-319-27674-8_11
Download citation
DOI: https://doi.org/10.1007/978-3-319-27674-8_11
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-27673-1
Online ISBN: 978-3-319-27674-8
eBook Packages: Computer ScienceComputer Science (R0)