Abstract
In applying the Hough transform to the problem of 3D shape recognition and registration, we develop two new and powerful improvements to this popular inference method. The first, intrinsic Hough, solves the problem of exponential memory requirements of the standard Hough transform by exploiting the sparsity of the Hough space. The second, minimum-entropy Hough, explains away incorrect votes, substantially reducing the number of modes in the posterior distribution of class and pose, and improving precision. Our experiments demonstrate that these contributions make the Hough transform not only tractable but also highly accurate for our example application. Both contributions can be applied to other tasks that already use the standard Hough transform.






Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Notes
Specifically we use the Shannon (1948) entropy, \(H = E[-\ln p(x)] = -\int p(x)\ln p(x) ~\mathrm d x\).
Strictly speaking, the minimum-entropy Hough transform is not a transform, because the probability of each location in Hough space cannot be computed independently.
A direct similarity is a transformation consisting of a rotation, a translation and a uniform scaling.
The requirement for a scale independent optimization strategy is a further reason to use the proxy of Eq. (8).
References
Toshiba CAD model point clouds dataset (2011). http://www.toshiba-europe.com/research/crl/cvg/projects/stereo_points.html.
Allan, M., & Williams, C. K. I. (2009). Object localisation using the generative template of features. Computer Vision and Image Understanding, 113, 824–838.
Ballard, D. H. (1981). Generalizing the Hough transform to detect arbitrary shapes. Pattern Recognition, 13(2), 111–122.
Barinova, O., Lempitsky, V., & Kohli, P. (2010). On detection of multiple object instances using Hough transforms. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.
Ben-Tzvi, D., & Sandler, M. B. (1990). A combinatorial Hough transform. Pattern Recognition Letters, 11(3), 167–174.
Besag, J. (1986). On the statistical analysis of dirty pictures. Journal of the Royal Statistical Society, Series B, 48(3), 259–302.
Birchfield, S., & Tomasi, C. (1999). Multiway cut for stereo and motion with slanted surfaces. In: Proceedings of the IEEE International Conference on Computer Vision.
Bober, M., & Kittler, J. (1993). Estimation of complex multimodal motion: An approach based on robust statistics and Hough transform. In: Proceedings of the British Machine Vision Conference.
Cheng, Y. (1995). Mean shift, mode seeking, and clustering. Transactions on Pattern Analysis and Machine Intelligence, 17(8), 790–799.
Delong, A., Osokin, A., Isack, H., & Boykov, Y. (2012). Fast approximate energy minimization with label costs. International Journal of Computer Vision, 96(1), 1–27.
Delong, A., Veksler, O., & Boykov, Y. (2012). Fast fusion moves for multi-model estimation. In: Proceedings of the European Conference on Computer Vision.
Drost, B., Ulrich, M., Navab, N., & Ilic, S. (2010). Model globally, match locally: Efficient and robust 3D object recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (pp. 998–1005).
Duda, R. O., & Hart, P. E. (1972). Use of the Hough transformation to detect lines and curves in pictures. Communications of the ACM, 15, 11–15.
Falk, H. (1970). Inequalities of J. W. Gibbs. American Journal of Physics, 38(7), 858–869.
Fisher, A., Fisher, R. B., Robertson, C., & Werghi, N. (1998). Finding surface correspondence for object recognition and registration using pairwise geometric histograms (pp. 674–686). In: Proceedings of the European Conference on Computer Vision.
Gall, J., & Lempitsky, V. (2009). Class-specific Hough forests for object detection (pp. 1022–1029). In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.
Gerig, G. (1987). Linking image-space and accumulator-space: A new approach for object-recognition (pp. 112–117). In: Proceedings of the IEEE International Conference on Computer Vision.
Hough, P.V.C. (1962) Method and means for recognizing complex patterns. U.S. Patent 3,069,654.
Illingworth, J., & Kittler, J. (1987). The adaptive Hough transform. Transactions on Pattern Analysis and Machine Intelligence, 9(5), 690–698.
Isack, H., & Boykov, Y. (2012). Energy-based geometric multi-model fitting. International Journal of Computer Vision, 97(2), 123–147.
Knopp, J., Prasad, M., Willems, G., Timofte, R., & Van Gool, L. (2010). Hough transform and 3D SURF for robust three dimensional classification (pp. 589–602). In: Proceedings of the European Conference on Computer Vision.
Lamdan, Y., & Wolfson, H. (1988). Geometric hashing: A general and efficient model-based recognition scheme (pp. 238–249). In: Proceedings of the IEEE International Conference on Computer Vision.
Leibe, B., Leonardis, A., & Schiele, B. (2004). Combined object categorization and segmentation with an implicit shape model. In: ECCV Workshop on Statistical Learning in Computer Vision.
Leibe, B., Leonardis, A., & Schiele, B. (2008). Robust object detection with interleaved categorization and segmentation. International Journal of Computer Vision, 77(1–3), 259–289.
Li, H., Lavin, M. A., & Le Master, R. J. (1986). Fast Hough transform: A hierarchical approach. Computer Vision, Graphics, and Image Processing, 36(2–3), 139–161.
MacKay, D. J. C. (2009). Information theory. Inference and learning algorithms. Cambridge: Cambridge University Press.
Maji, S., & Malik, J. (2009). Object detection using a max-margin Hough transform. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.
Mian, A., Bennamoun, M., & Owens, R. (2006). Three-dimensional model-based object recognition and segmentation in cluttered scenes. Transactions on Pattern Analysis and Machine Intelligence, 28(10), 1584–1601.
Minka, T. P. (2003). The ‘summation hack’ as an outlier model. Technical note.
Okada, R. (2009). Discriminative generalized Hough transform for object detection (pp. 2000–2005). In: Proceedings of the IEEE International Conference on Computer Vision.
Pham, M. T., Woodford, O. J., Perbet, F., Maki, A., Stenger, B., & Cipolla, R. (2011). A new distance for scale-invariant 3D shape recognition and registration. In: Proceedings of the IEEE International Conference on Computer Vision.
Rosten, E., & Loveland, R. (2009). Camera distortion self-calibration using the plumb-line constraint and minimal Hough entropy. Machine Vision and Applications.
Shannon, C. E. (1948). A mathematical theory of communication. The Bell System Technical Journal, 27(379–423), 623–656.
Sheikh, Y. A., Khan, E. A., & Kanade, T. (2007). Mode-seeking by medoidshifts. In: Proceedings of the IEEE International Conference on Computer Vision.
Stephens, R. S. (1991). A probabilistic approach to the Hough transform. Image and Vision Computing, 9(1), 66–71.
Toldo, R., & Fusiello, A. (2008). Robust multiple structures estimation with j-linkage. In: Proceedings of the European Conference on Computer Vision.
Tombari, F., & Di Stefano, L. (2010). Object recognition in 3D scenes with occlusions and clutter by Hough voting (pp. 349–355). In: Proceedings of PSIVT.
Vedaldi, A., & Soatto, S. (2008). Quick shift and kernel methods for mode seeking (pp. 705–718). In: Proceedings of the European Conference on Computer Vision.
Vincent, E., & Laganiere, R. (2001). Detecting planar homographies in an image pair (pp. 182–187). In: Proceedings of the International Symposium on Image and Signal Processing and Analysis.
Vogiatzis, G., & Hernández, C. (2011). Video-based, real-time multi view stereo. Image and Vision Computing, 29(7), 434–441.
Woodford, O. J., Pham, M. T., Maki, A., Gherardi, R., Perbet, F., & Stenger, B. (2012). Contraction moves for geometric model fitting. In: Proceedings of the European Conference on Computer Vision.
Xu, L., Oja, E., & Kultanen, P. (1990). A new curve detection method: Randomized Hough transform (RHT). Pattern Recognition Letters, 11(5), 331–338.
Zhang, W., & Kosecká, J. (2007). Nonparametric estimation of multiple structures with outliers. In R. Vidal, A. Heyden, & Y. Ma (Eds.), Dynamical Vision, Lecture Notes in Computer Science, vol. 4358 (pp. 60–74). Heidelberg: Springer.
Zhang, Y., & Chen, T. (2010). Implicit shape kernel for discrimintative learning of the Hough transform detector. In: Proceedings of the British Machine Vision Conference.
Zuliani, M., Kenney, C. S., & Manjunath, B. S. (2005). The multiRANSAC algorithm and its application to detect planar homographies. In: Proceedings of the IEEE International Conference on Image Processing.
Acknowledgments
The authors are extremely grateful to Bob Fisher, Andrew Fitzgibbon, Chris Williams, John Illingworth and the anonymous reviewers for providing valuable feedback on this work.
Author information
Authors and Affiliations
Corresponding author
Appendix Proof of the Integer Nature of Vote Weights
Appendix Proof of the Integer Nature of Vote Weights
Theorem 1
Given Eq. (3), an integer set of optimal values of \(\varvec{\uptheta }\) exists, i.e. for which \(\theta _{ij}\in \{0,1\}~\forall i,j\).
Proof
Let \(\varvec{\uptheta }^{\prime }\) denote a globally optimal value of \(\varvec{\uptheta }\), i.e. one that minimizes Eq. (3). Let us consider only the vote weights of the \(i^\mathrm th \) feature, and assume the other vote weights are fixed at their optimal value, i.e. \(\varvec{\uptheta }_j = \varvec{\uptheta }_j^{\prime } ~\forall j\ne i\). The objective function can then be written as
where \(C(\mathbf y )\) is a function which is independent of \(\varvec{\uptheta }_i\). Note that we have further assumed that the instance of \(\varvec{\uptheta }_i\) in the \(\ln p(\mathbf y |\varvec{\uptheta }_i)\) term in Eq. (14) is also at its optimal value, \(\varvec{\uptheta }_i^{\prime }\). This allows us to rewrite the objective function as follows:
where \(D\) is a constant, as are the values \(a_{ij}\). Given the constraints of Eq. (2), minimizing Eq. (16) with respect to \(\varvec{\uptheta }_i\), can always be achieved by setting \(\theta _{ij} = 1\) for one \(j\) for which \(a_{ij}\) is largest, and setting all other weights to 0. In addition, Gibbs’ inequality Falk (1970) implies that Eq. (14) is minimized when \(\varvec{\uptheta }_i= \varvec{\uptheta }_i^{\prime }\) (as we require them to be). Therefore the \(i^\mathrm th \) feature must have an integer set of optimal weights. This argument can be applied to each feature independently. \(\square \)
Rights and permissions
About this article
Cite this article
Woodford, O.J., Pham, MT., Maki, A. et al. Demisting the Hough Transform for 3D Shape Recognition and Registration. Int J Comput Vis 106, 332–341 (2014). https://doi.org/10.1007/s11263-013-0623-2
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11263-013-0623-2