Bayesian perspective-plane (BPP) with maximum likelihood searching for visual localization

Hu, Zhaozheng; Matsuyama, Takashi

doi:10.1007/s11042-014-2134-8

Bayesian perspective-plane (BPP) with maximum likelihood searching for visual localization

Published: 12 June 2014

Volume 74, pages 9547–9572, (2015)
Cite this article

Multimedia Tools and Applications Aims and scope Submit manuscript

Zhaozheng Hu^1,2 &
Takashi Matsuyama²

243 Accesses
2 Citations
Explore all metrics

Abstract

The proposed “Perspective-Plane” in this paper is similar to the well-known “Perspective-n-Point (PnP)” or “Perspective-n-Line (PnL)” problems in computer vision. However, it has broader applications and potentials, because planar scenes are more widely available than control points or lines in daily life. We address this problem in the Bayesian framework and propose the “Bayesian Perspective-Plane (BPP)” algorithm, which can deal with more generalized constraints rather than type-specific ones. The BPP algorithm consists of three steps: 1) plane normal computation by maximum likelihood searching from Bayesian formulation; 2) plane distance computation; and 3) visual localization. In the first step, computation of the plane normal is formulated within the Bayesian framework, and is solved by using the proposed Maximum Likelihood Searching Model (MLS-M). Two searching modes of 2D and 1D are discussed. MLS-M can incorporate generalized planar and out-of-plane deterministic constraints. With the computed normal, the plane distance is recovered from a reference length or distance. The positions of the object or the camera can be determined afterwards. Extensions of the proposed BPP algorithm to deal with un-calibrated images and for camera calibration are discussed. The BPP algorithm has been tested with both simulation and real image data. In the experiments, the algorithm was applied to recover planar structure and localize objects by using different types of constraints. The 2D and 1D searching modes were illustrated for plane normal computation. The results demonstrate that the algorithm is accurate and generalized for object localization. Extensions of the proposed model for camera calibration were also illustrated in the experiment. The potential of the proposed algorithm was further demonstrated to solve the classic Perspective-Three-Point (P3P) problem and classify the solutions in the experiment. The proposed BPP algorithm suggests a practical and effective approach for visual localization.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Accurate and Linear Time Pose Estimation from Points and Lines

Pose Estimation for Non-Central Cameras Using Planes

Article 10 March 2015

Weak-Perspective and Scaled-Orthographic Structure from Motion with Missing Data

References

Adan A, Martin A, Valero E, Merchan P (2009) Landmark real-time recognition and positioning for pedestrian navigation. CIARP, Guadalajara, Mexico
Book Google Scholar
Cham T, Arridhana C et al (2010) Estimating camera pose from a single urban ground-view omni-directional image and a 2D building outline map. CVPR, SF, CA
Google Scholar
Criminisi A, Reid I, Zisserman A (2000) Single view metrology. International Journal of Computer Vision 40(2):123–148
Article MATH Google Scholar
Desouza GN, Kak AC (2002) Vision for mobile robot navigation: a survey. IEEE Trans on Pattern Analysis and Machine Intelligence 24(2):237–267
Article Google Scholar
Durrant-Whyte H, Bailey T (2006) Simultaneous localization and mapping (SLAM): part I the essential algorithms. IEEE Robotics and Automation Magazine 13(2):99–110
Article Google Scholar
Guan P, Weiss A, Balan A, Black M (2009) Estimating human shape and pose from a single image, international conference on computer vision. Kyoto, Japan
Google Scholar
Guo F, Chellappa R (2010) Video metrology using a single camera. IEEE Trans on Pattern Analysis and Machine Intelligence 32(7):1329–1335
Article Google Scholar
Hartley R, Zisserman A (2004) Multiple view geometry in computer vision, 2nd edn. Cambridge University Press, Cambridge
Book MATH Google Scholar
Horn BKP (1987) Closed-form solution of absolute orientation using unit quaternions. J Opt Soc Am A 4(4):629–642
Article MathSciNet Google Scholar
Hu Z, Matsuyama T (2012) Bayesian perspective-plane (BPP) for localization, international conference on computer vision theory and applications (VISAPP). Rome, Italy, pp 241–246
Google Scholar
L. Kneip, D. Scaramuzza, R. Siegwart,, (2011) A novel parameterization of the perspective-three-point problem for a direct computation of absolute camera position and orientation, Proc. of the 24th IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
H. Lategahn, C. Stiller, Vision-Only Localization, IEEE Intelligent (2014) Transportation Systems Magazine, DOI: 10.1109/TITS.2014.2298492
Lee DC, Hebert M, Kanade T (2009) Geometric reasoning for single image structure recovery. IEEE International Conference on Computer Vision and Pattern Recognition (CVPR)(June)
Leopardi P (2006) A partition of the unit sphere into regions of equal area and small diameter. Electronic Transactions on Numerical Analysis 25(12):309–327
MathSciNet MATH Google Scholar
Li S, Xu C (2011) A stable direct solution of perspective-three-point problem. International Journal of Pattern Recognition and Artificial Intelligence 25(11):627–642
Article MathSciNet Google Scholar
Liebowitz D, Zisserman A (1998) Metric rectification for perspective images of planes. CVPR, Santa Barbara, CA
Book Google Scholar
A. Pretto, S. Tonello, E. Menegatti, (2013) Flexible 3D localization of planar objects for industrial bin-picking with monocamera vision system, IEEE International Conference on Automation Science and Engineering, pp. 168–175
S.Q., Li, C. Xu, and M. Xie, A Robust (2012) O (n) Solution to the perspective-three-point problem, IEEE Transaction on Pattern Analysis and Machine Intelligence, 34 (7): 1444–1450
Schneider D, Fu X, Wong K (2010) Reconstruction of display and eyes from a single image. CVPR, SF, CA
Google Scholar
Shi F, Zhang X, Liu Y (2004) A new method of camera pose estimation using 2D-3D corner correspondence. Pattern Recognition Letters 25(10):805–809
Article Google Scholar
Sun Y, Yin L (2008) Automatic pose estimation of 3D facial models. ICPR, FL, US
Book Google Scholar
Wang G, Hu Z, Wu F, Tsui H (2005) Single view metrology from scene constraints. Image and Vision Computing 23(9):831–840
Article Google Scholar
Wang R, Jiang G, Quan L, Wu C (2012) Camera calibration using identical objects. Machine Vision and Applications 23(3):579–587
Article Google Scholar
Witkin AP (1981) Recovering surface shape and orientation from texture. Artificial Intelligence 17(1–3):17–45
Article Google Scholar
Wolfe W, Mathis D, Sklair C, Magee M (1991) The perspective view of 3 points. IEEE Transaction on Pattern Analysis and Machine Intelligence 13(1):66–73
Article Google Scholar
Wu Y, Li X, Wu F, Hu Z (2006) Coplanar circles, quasi-affine invariance and calibration. Image and Vision Computing 24(4):319–326
Article Google Scholar
A Zakhor, A Hallquist, Single view pose estimation of mobile devices in urban environments, Proceedings of the 2013 I.E. Workshop on Applications of Computer Vision (WACV), 2013, pp. 347–354
Zhang Z (2000) A flexible new technique for camera calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 22(11):1330–1334
Article Google Scholar
Zhang BW, Li Y (2008) Dynamic calibration of the relative pose and error analysis in a structured light system. J Opt Soc Am A 25(3):612–622
Article Google Scholar
Zhu X, Ramanan D (2012) Face detection, pose estimation, and landmark localization in the wild. Computer Vision and Pattern Recognition (CVPR), Providence, RI
Google Scholar

Download references

Acknowledgments

The work presented in this paper was sponsored by National Natural Science Foundation of China (NSFC) (No. 51208168), Tianjin Natural Science Foundation (No. 13JCYBJC37700), the Youth Top-Notch Talent Plan of Hebei Province, China, the Fundamental Research Funds for the Central Universities (WUT: 2014-IV-068), and the Grant-in-Aid for Scientific Research Program (No. 10049) from the Japan Society for the Promotion of Science (JSPS).

Author information

Authors and Affiliations

ITS Research Center, Wuhan University of Technology, Wuhan, 430063, People Republic of China
Zhaozheng Hu
Graduate School of Informatics, Kyoto University, Kyoto, 606-8501, Japan
Zhaozheng Hu & Takashi Matsuyama

Authors

Zhaozheng Hu
View author publications
You can also search for this author in PubMed Google Scholar
Takashi Matsuyama
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Zhaozheng Hu.

Appendixes

A.
Uniformly sampling on the circle in 3D space

We can uniform sample on the circle in Eq. (19) by first sampling on a standard circle, and then mapping the sampled points via a rotation transform. This is implemented by following the three steps:
1. Step 1
  Uniformly sample the standard circle, which has the following equation:
  $$ \left\{\begin{array}{l}{X}^2+{Y}^2=1\\ {}Z=0\end{array}\right. $$
  (29)
  
  This can be done by uniformly sampling within the angle space, and a sampled point is represented by $ {\left[\begin{array}{ccc}\hfill \cos \left({\theta}_i\right)\hfill & \hfill \sin \left({\theta}_i\right)\hfill & \hfill 0\hfill \end{array}\right]}^T $, with θ _i ∈ [0 2π).
2. Step 2
  Compute the rotation transform. The two circles defined by Eq. (19) and (29) are mapped via a rotation matrix, which satisfies
  $$ R{\left[\begin{array}{ccc}\hfill 0\hfill & \hfill 0\hfill & \hfill 1\hfill \end{array}\right]}^T=N\kern0.5em $$
  (30)
  
  The rotation matrix is not uniquely determined, because a circle is invariant to the rotation around the normal. We practically can use two arbitrary orthogonal vectors that satisfy Eq. (3), as the first and second column vectors.
3. Step 3
  Transform the sampled points by rotation. Each sampled normal on the standard circle is transformed with the rotation matrix
  $$ R\left[\begin{array}{c}\hfill \cos \left({\theta}_i\right)\hfill \\ {}\hfill \sin \left({\theta}_i\right)\hfill \\ {}\hfill 0\hfill \end{array}\right]=\left[\begin{array}{c}\hfill {r}_1 \cos \left({\theta}_i\right)+{r}_4 \sin \left({\theta}_i\right)\hfill \\ {}\hfill {r}_2 \cos \left({\theta}_i\right)+{r}_5 \sin \left({\theta}_i\right)\hfill \\ {}\hfill {r}_3 \cos \left({\theta}_i\right)+{r}_6 \sin \left({\theta}_i\right)\hfill \end{array}\right] $$
  (31)
  
  Hence, a uniform sampling on the circle from Eq. (19) is accomplished. We can prove that the angles between two neighboring sampled points before and after transformation are identical, because
  $$ {\left(R\left[\begin{array}{c}\hfill \cos \left({\theta}_{\mathrm{i}}\right)\hfill \\ {}\hfill \sin \left({\theta}_{\mathrm{i}}\right)\hfill \\ {}\hfill 0\hfill \end{array}\right]\right)}^T\left(R\left[\begin{array}{c}\hfill \cos \left({\theta}_{\mathrm{i}\hbox{-} 1}\right)\hfill \\ {}\hfill \sin \left({\theta}_{\mathrm{i}\hbox{-} 1}\right)\hfill \\ {}\hfill 0\hfill \end{array}\right]\right)={\left[\begin{array}{c}\hfill \cos \left({\theta}_{\mathrm{i}}\right)\hfill \\ {}\hfill \sin \left({\theta}_{\mathrm{i}}\right)\hfill \\ {}\hfill 0\hfill \end{array}\right]}^T\left[\begin{array}{c}\hfill \cos \left({\theta}_{\mathrm{i}\hbox{-} 1}\right)\hfill \\ {}\hfill \sin \left({\theta}_{\mathrm{i}\hbox{-} 1}\right)\hfill \\ {}\hfill 0\hfill \end{array}\right] $$
  (32)
B.
Define the searching space for focal length

It is difficult to define the searching space of the focal length directly, since images are in different resolutions in practice. Instead, we define the searching space from the camera viewing angles. There are three view angles defined for a camera: horizontal, vertical, and diagonal. In this paper, we use the horizontal one. It is defined as
$$ {\theta}_h=2{ \tan}^{-1}\left(\frac{h}{2f}\right) $$
(33)
where h is the horizontal resolution of the image, and f is the camera focal length. As a result, we can estimate the focal length from the view angle and the image resolution:
$$ f=\frac{h}{2 \tan \left(\frac{\theta_h}{2}\right)} $$
(34)

Usually, we have some prior knowledge of the common used lens. We can thus define the searching space of the focal length from the range of the view angle and the image resolution.
C.
Lemma: Determining the support plane is the necessary and sufficient condition to solve the P3P problem
1. 1.
  Proof: 1) Necessary condition
  
  The distances of the three control points to the camera are known from a solved P3P. Hence, we can determine the 3D coordinates of each control points by the following equations
  $$ \left\{\begin{array}{l}{X}_i=\lambda {K}^{-1}{x}_i\\ {}\left\Vert {X}_i\right\Vert ={d}_i\end{array}\right. $$
  (35)
  
  With the recovered control points, the support plane is uniquely determined then.
2. 1.
  2) Sufficient condition
  
  The plane normal and distance are known from a determined support plane. As a result, we can use Eq. (4) to compute the 3D positions of each control points. The distances of the three points are computed readily from the 3D coordinates. Hence, the P3P problem is solved.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Hu, Z., Matsuyama, T. Bayesian perspective-plane (BPP) with maximum likelihood searching for visual localization. Multimed Tools Appl 74, 9547–9572 (2015). https://doi.org/10.1007/s11042-014-2134-8

Download citation

Received: 08 November 2013
Revised: 15 April 2014
Accepted: 26 May 2014
Published: 12 June 2014
Issue Date: November 2015
DOI: https://doi.org/10.1007/s11042-014-2134-8

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Bayesian perspective-plane (BPP) with maximum likelihood searching for visual localization

Abstract

Access this article

Similar content being viewed by others

Accurate and Linear Time Pose Estimation from Points and Lines

Pose Estimation for Non-Central Cameras Using Planes

Weak-Perspective and Scaled-Orthographic Structure from Motion with Missing Data

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Appendixes

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Bayesian perspective-plane (BPP) with maximum likelihood searching for visual localization

Abstract

Access this article

Similar content being viewed by others

Accurate and Linear Time Pose Estimation from Points and Lines

Pose Estimation for Non-Central Cameras Using Planes

Weak-Perspective and Scaled-Orthographic Structure from Motion with Missing Data

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Appendixes

Appendixes

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation