Skip to main content
Log in

Bayesian perspective-plane (BPP) with maximum likelihood searching for visual localization

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

The proposed “Perspective-Plane” in this paper is similar to the well-known “Perspective-n-Point (PnP)” or “Perspective-n-Line (PnL)” problems in computer vision. However, it has broader applications and potentials, because planar scenes are more widely available than control points or lines in daily life. We address this problem in the Bayesian framework and propose the “Bayesian Perspective-Plane (BPP)” algorithm, which can deal with more generalized constraints rather than type-specific ones. The BPP algorithm consists of three steps: 1) plane normal computation by maximum likelihood searching from Bayesian formulation; 2) plane distance computation; and 3) visual localization. In the first step, computation of the plane normal is formulated within the Bayesian framework, and is solved by using the proposed Maximum Likelihood Searching Model (MLS-M). Two searching modes of 2D and 1D are discussed. MLS-M can incorporate generalized planar and out-of-plane deterministic constraints. With the computed normal, the plane distance is recovered from a reference length or distance. The positions of the object or the camera can be determined afterwards. Extensions of the proposed BPP algorithm to deal with un-calibrated images and for camera calibration are discussed. The BPP algorithm has been tested with both simulation and real image data. In the experiments, the algorithm was applied to recover planar structure and localize objects by using different types of constraints. The 2D and 1D searching modes were illustrated for plane normal computation. The results demonstrate that the algorithm is accurate and generalized for object localization. Extensions of the proposed model for camera calibration were also illustrated in the experiment. The potential of the proposed algorithm was further demonstrated to solve the classic Perspective-Three-Point (P3P) problem and classify the solutions in the experiment. The proposed BPP algorithm suggests a practical and effective approach for visual localization.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16
Fig. 17
Fig. 18
Fig. 19
Fig. 20
Fig. 21
Fig. 22

Similar content being viewed by others

References

  1. Adan A, Martin A, Valero E, Merchan P (2009) Landmark real-time recognition and positioning for pedestrian navigation. CIARP, Guadalajara, Mexico

    Book  Google Scholar 

  2. Cham T, Arridhana C et al (2010) Estimating camera pose from a single urban ground-view omni-directional image and a 2D building outline map. CVPR, SF, CA

    Google Scholar 

  3. Criminisi A, Reid I, Zisserman A (2000) Single view metrology. International Journal of Computer Vision 40(2):123–148

    Article  MATH  Google Scholar 

  4. Desouza GN, Kak AC (2002) Vision for mobile robot navigation: a survey. IEEE Trans on Pattern Analysis and Machine Intelligence 24(2):237–267

    Article  Google Scholar 

  5. Durrant-Whyte H, Bailey T (2006) Simultaneous localization and mapping (SLAM): part I the essential algorithms. IEEE Robotics and Automation Magazine 13(2):99–110

    Article  Google Scholar 

  6. Guan P, Weiss A, Balan A, Black M (2009) Estimating human shape and pose from a single image, international conference on computer vision. Kyoto, Japan

    Google Scholar 

  7. Guo F, Chellappa R (2010) Video metrology using a single camera. IEEE Trans on Pattern Analysis and Machine Intelligence 32(7):1329–1335

    Article  Google Scholar 

  8. Hartley R, Zisserman A (2004) Multiple view geometry in computer vision, 2nd edn. Cambridge University Press, Cambridge

    Book  MATH  Google Scholar 

  9. Horn BKP (1987) Closed-form solution of absolute orientation using unit quaternions. J Opt Soc Am A 4(4):629–642

    Article  MathSciNet  Google Scholar 

  10. Hu Z, Matsuyama T (2012) Bayesian perspective-plane (BPP) for localization, international conference on computer vision theory and applications (VISAPP). Rome, Italy, pp 241–246

    Google Scholar 

  11. L. Kneip, D. Scaramuzza, R. Siegwart,, (2011) A novel parameterization of the perspective-three-point problem for a direct computation of absolute camera position and orientation, Proc. of the 24th IEEE Conference on Computer Vision and Pattern Recognition (CVPR)

  12. H. Lategahn, C. Stiller, Vision-Only Localization, IEEE Intelligent (2014) Transportation Systems Magazine, DOI: 10.1109/TITS.2014.2298492

  13. Lee DC, Hebert M, Kanade T (2009) Geometric reasoning for single image structure recovery. IEEE International Conference on Computer Vision and Pattern Recognition (CVPR)(June)

  14. Leopardi P (2006) A partition of the unit sphere into regions of equal area and small diameter. Electronic Transactions on Numerical Analysis 25(12):309–327

    MathSciNet  MATH  Google Scholar 

  15. Li S, Xu C (2011) A stable direct solution of perspective-three-point problem. International Journal of Pattern Recognition and Artificial Intelligence 25(11):627–642

    Article  MathSciNet  Google Scholar 

  16. Liebowitz D, Zisserman A (1998) Metric rectification for perspective images of planes. CVPR, Santa Barbara, CA

    Book  Google Scholar 

  17. A. Pretto, S. Tonello, E. Menegatti, (2013) Flexible 3D localization of planar objects for industrial bin-picking with monocamera vision system, IEEE International Conference on Automation Science and Engineering, pp. 168–175

  18. S.Q., Li, C. Xu, and M. Xie, A Robust (2012) O (n) Solution to the perspective-three-point problem, IEEE Transaction on Pattern Analysis and Machine Intelligence, 34 (7): 1444–1450

  19. Schneider D, Fu X, Wong K (2010) Reconstruction of display and eyes from a single image. CVPR, SF, CA

    Google Scholar 

  20. Shi F, Zhang X, Liu Y (2004) A new method of camera pose estimation using 2D-3D corner correspondence. Pattern Recognition Letters 25(10):805–809

    Article  Google Scholar 

  21. Sun Y, Yin L (2008) Automatic pose estimation of 3D facial models. ICPR, FL, US

    Book  Google Scholar 

  22. Wang G, Hu Z, Wu F, Tsui H (2005) Single view metrology from scene constraints. Image and Vision Computing 23(9):831–840

    Article  Google Scholar 

  23. Wang R, Jiang G, Quan L, Wu C (2012) Camera calibration using identical objects. Machine Vision and Applications 23(3):579–587

    Article  Google Scholar 

  24. Witkin AP (1981) Recovering surface shape and orientation from texture. Artificial Intelligence 17(1–3):17–45

    Article  Google Scholar 

  25. Wolfe W, Mathis D, Sklair C, Magee M (1991) The perspective view of 3 points. IEEE Transaction on Pattern Analysis and Machine Intelligence 13(1):66–73

    Article  Google Scholar 

  26. Wu Y, Li X, Wu F, Hu Z (2006) Coplanar circles, quasi-affine invariance and calibration. Image and Vision Computing 24(4):319–326

    Article  Google Scholar 

  27. A Zakhor, A Hallquist, Single view pose estimation of mobile devices in urban environments, Proceedings of the 2013 I.E. Workshop on Applications of Computer Vision (WACV), 2013, pp. 347–354

  28. Zhang Z (2000) A flexible new technique for camera calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 22(11):1330–1334

    Article  Google Scholar 

  29. Zhang BW, Li Y (2008) Dynamic calibration of the relative pose and error analysis in a structured light system. J Opt Soc Am A 25(3):612–622

    Article  Google Scholar 

  30. Zhu X, Ramanan D (2012) Face detection, pose estimation, and landmark localization in the wild. Computer Vision and Pattern Recognition (CVPR), Providence, RI

    Google Scholar 

Download references

Acknowledgments

The work presented in this paper was sponsored by National Natural Science Foundation of China (NSFC) (No. 51208168), Tianjin Natural Science Foundation (No. 13JCYBJC37700), the Youth Top-Notch Talent Plan of Hebei Province, China, the Fundamental Research Funds for the Central Universities (WUT: 2014-IV-068), and the Grant-in-Aid for Scientific Research Program (No. 10049) from the Japan Society for the Promotion of Science (JSPS).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Zhaozheng Hu.

Appendixes

Appendixes

  1. A.

    Uniformly sampling on the circle in 3D space

    We can uniform sample on the circle in Eq. (19) by first sampling on a standard circle, and then mapping the sampled points via a rotation transform. This is implemented by following the three steps:

    1. Step 1

      Uniformly sample the standard circle, which has the following equation:

      $$ \left\{\begin{array}{l}{X}^2+{Y}^2=1\\ {}Z=0\end{array}\right. $$
      (29)

      This can be done by uniformly sampling within the angle space, and a sampled point is represented by \( {\left[\begin{array}{ccc}\hfill \cos \left({\theta}_i\right)\hfill & \hfill \sin \left({\theta}_i\right)\hfill & \hfill 0\hfill \end{array}\right]}^T \), with θ i  ∈ [0 2π).

    2. Step 2

      Compute the rotation transform. The two circles defined by Eq. (19) and (29) are mapped via a rotation matrix, which satisfies

      $$ R{\left[\begin{array}{ccc}\hfill 0\hfill & \hfill 0\hfill & \hfill 1\hfill \end{array}\right]}^T=N\kern0.5em $$
      (30)

      The rotation matrix is not uniquely determined, because a circle is invariant to the rotation around the normal. We practically can use two arbitrary orthogonal vectors that satisfy Eq. (3), as the first and second column vectors.

    3. Step 3

      Transform the sampled points by rotation. Each sampled normal on the standard circle is transformed with the rotation matrix

      $$ R\left[\begin{array}{c}\hfill \cos \left({\theta}_i\right)\hfill \\ {}\hfill \sin \left({\theta}_i\right)\hfill \\ {}\hfill 0\hfill \end{array}\right]=\left[\begin{array}{c}\hfill {r}_1 \cos \left({\theta}_i\right)+{r}_4 \sin \left({\theta}_i\right)\hfill \\ {}\hfill {r}_2 \cos \left({\theta}_i\right)+{r}_5 \sin \left({\theta}_i\right)\hfill \\ {}\hfill {r}_3 \cos \left({\theta}_i\right)+{r}_6 \sin \left({\theta}_i\right)\hfill \end{array}\right] $$
      (31)

      Hence, a uniform sampling on the circle from Eq. (19) is accomplished. We can prove that the angles between two neighboring sampled points before and after transformation are identical, because

      $$ {\left(R\left[\begin{array}{c}\hfill \cos \left({\theta}_{\mathrm{i}}\right)\hfill \\ {}\hfill \sin \left({\theta}_{\mathrm{i}}\right)\hfill \\ {}\hfill 0\hfill \end{array}\right]\right)}^T\left(R\left[\begin{array}{c}\hfill \cos \left({\theta}_{\mathrm{i}\hbox{-} 1}\right)\hfill \\ {}\hfill \sin \left({\theta}_{\mathrm{i}\hbox{-} 1}\right)\hfill \\ {}\hfill 0\hfill \end{array}\right]\right)={\left[\begin{array}{c}\hfill \cos \left({\theta}_{\mathrm{i}}\right)\hfill \\ {}\hfill \sin \left({\theta}_{\mathrm{i}}\right)\hfill \\ {}\hfill 0\hfill \end{array}\right]}^T\left[\begin{array}{c}\hfill \cos \left({\theta}_{\mathrm{i}\hbox{-} 1}\right)\hfill \\ {}\hfill \sin \left({\theta}_{\mathrm{i}\hbox{-} 1}\right)\hfill \\ {}\hfill 0\hfill \end{array}\right] $$
      (32)
  2. B.

    Define the searching space for focal length

    It is difficult to define the searching space of the focal length directly, since images are in different resolutions in practice. Instead, we define the searching space from the camera viewing angles. There are three view angles defined for a camera: horizontal, vertical, and diagonal. In this paper, we use the horizontal one. It is defined as

    $$ {\theta}_h=2{ \tan}^{-1}\left(\frac{h}{2f}\right) $$
    (33)

    where h is the horizontal resolution of the image, and f is the camera focal length. As a result, we can estimate the focal length from the view angle and the image resolution:

    $$ f=\frac{h}{2 \tan \left(\frac{\theta_h}{2}\right)} $$
    (34)

    Usually, we have some prior knowledge of the common used lens. We can thus define the searching space of the focal length from the range of the view angle and the image resolution.

  3. C.

    Lemma: Determining the support plane is the necessary and sufficient condition to solve the P3P problem

    1. 1.

      Proof: 1) Necessary condition

      The distances of the three control points to the camera are known from a solved P3P. Hence, we can determine the 3D coordinates of each control points by the following equations

      $$ \left\{\begin{array}{l}{X}_i=\lambda {K}^{-1}{x}_i\\ {}\left\Vert {X}_i\right\Vert ={d}_i\end{array}\right. $$
      (35)

      With the recovered control points, the support plane is uniquely determined then.

    2. 1.

      2) Sufficient condition

      The plane normal and distance are known from a determined support plane. As a result, we can use Eq. (4) to compute the 3D positions of each control points. The distances of the three points are computed readily from the 3D coordinates. Hence, the P3P problem is solved.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Hu, Z., Matsuyama, T. Bayesian perspective-plane (BPP) with maximum likelihood searching for visual localization. Multimed Tools Appl 74, 9547–9572 (2015). https://doi.org/10.1007/s11042-014-2134-8

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11042-014-2134-8

Keywords

Navigation