Analysis of Camera Pose Estimation Using 2D Scene Features for Augmented Reality Applications

Meshkat Alsadat, Shabnam; Laurendeau, Denis

doi:10.1007/978-3-319-94211-7_27

Shabnam Meshkat Alsadat¹⁷ &
Denis Laurendeau¹⁷

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 10884))

Included in the following conference series:

International Conference on Image and Signal Processing

2265 Accesses

Abstract

Augmented reality (AR) enables civil engineers and architects to visualize a representation of the structure that is going to be built at different stages of the construction. Overlaying a 3D model onto an image requires localizing the camera accurately in its environment. In this paper, we evaluate the camera pose estimation methods using circles and straight lines, as two of the common features visible in the architectural structures, by taking into account the relationship between the coordinates of the features in the 2D image and their corresponding positions in the 3D world. The proposed approach could be used for AR applications.

You have full access to this open access chapter, Download conference paper PDF

3D Pose Estimation Oriented to the Initialization of an Augmented Reality System Applied to Cultural Heritage

An Improved Augmented Reality Registration Method Based on Visual SLAM

Local 3D Pose Estimation of Feature Points Based on RGB-D Information for Object Based Augmented Reality

Keywords

1 Introduction

The principal task in AR is to localize the camera with respect to the scene that is being viewed and to superimpose virtual content from the estimated camera location. Positioning a camera requires tracking the features in the captured images. Based on the position of the features in the real environment and in the image, the exterior pose of the camera can be obtained and augmentation can be achieved. Camera pose estimation using two common geometrical primitives, namely circles and lines, is investigated in this work.

Some research works has been dedicated to the extrinsic camera calibration from circle-ellipse correspondences. The relationship between the normal vector and the center of a detected ellipse, and those of the world circle is the main idea in [1]. It considers a single view of two coplanar circles with unknown radii. However, this approach suffers from the ambiguity of solutions. In [2], it is attempted to handle the ambiguity by using a marker with a specific design to distinguish the correct pose from the set of possible solutions. Using equiform transformations, [3, 4] provide almost the same approach with two coplanar conics. The pose they estimate requires to be improved by an iterative process though. Another approach with two coplanar circles with known radii is presented in [5]. It aims for exterior calibration in closed-form based on the pole-polar constraints of the image of the absolute conic of both circles and is reported to be useful as an initialization for other iterative methods.

There are a large number of methods based on the geometrical characteristics of lines, particularly the coplanarity of the line on the image plane and the line in the world frame such as [6, 7]. For instance, [8, 13] solve for camera pose considering the fact that both of the lines in 2D and 3D must be perpendicular to the normal of the plane containing these lines and the optical center. In addition, the approaches of OPnPL and EPnPL presented in [9] are based on their point-only counterparts to estimate the pose of the camera using lines as well. The noniterative method of RPnL [10] defines an intermediate frame between the object and the camera frame to reduce the complexity of the algorithm. Furthermore, ASPnL [11], inspired by RPnL [10], decreases the complexity even more to be almost linear to the number of lines by adding a second intermediate frame. Although ASPnL [11] outperforms the above mentioned methods in terms of accuracy, it is highly sensitive to noise and outliers in large line sets.

2 Basic Concept of Camera Pose Estimation Using Circle and Line Primitives

2.1 Pose Estimation from Circle-Ellipse Correspondences

What follows in this section summarizes the concepts in [1, 12] that are used in the experiments reported in Sect. 3.1. Consider a circle with known center coordinates and normal vector to its supporting plane as $C_w$ and $N_w$ respectively. In perspective projection, the image of this circle on the image plane will be an ellipse in a general configuration. The rotation matrix $R$ and the translation vector $t$ are calculated as:

$$\begin{aligned} N_c = RN_w \end{aligned}$$

(1)

$$\begin{aligned} C_c = RC_x + t \end{aligned}$$

(2)

where $C_c$ is the center and $N_c$ is the normal vector to the supporting plane of the resulting circle in the camera coordinate frame [12]. Primarily, the normal vector and the center of the image circle are computed from the ellipse equation in matrix form (3) [1].

$$\begin{aligned} x^T \begin{bmatrix} A&B&D \\ B&C&E \\ D&E&F \\ \end{bmatrix} x = 0 \end{aligned}$$

(3)

Let the ellipse be the base of an oblique elliptical cone and the optical center be its apex, see Fig. 1(a). With the image plane placed at $z = f$, and $f$ being the focal length, every point on the oblique elliptical cone can be expressed as:

$$\begin{aligned} P^T = k(x_e,y_e,f)^T \end{aligned}$$

(4)

where $k$ is a scale factor indicating the distance of each point of the cone to the origin of the camera frame. From (3) and (4), the oblique elliptical cone is:

$$\begin{aligned} Q = \begin{bmatrix} A&B&\frac{D}{f} \\ B&C&\frac{E}{f} \\ \frac{D}{f}&\frac{E}{f}&\frac{F}{f^2} \\ \end{bmatrix} \end{aligned}$$

(5)

By decomposing $Q$, we obtain the eigenvalues $D = diag(\lambda _{1},\lambda _{2},\lambda _{3})$ and the eigenvectors $V = (V_{1}^T,V_{2}^T,V_{3}^T)$ of the oblique elliptical cone. Considering the orthogonality constraint of the rotation matrix, the normal vector and the center of the circle in the camera coordinate frame are calculated as follows.

(6)

(7)

$S_1,S_2,S_3$ are either $+1$ or $-1$, giving eight possible set of solutions for N and C. Parameter r is the radius of the circle. Based on the coordinate system configurations, the incorrect solutions are eliminated under conditions in (8) and (9). These conditions ensure that the circle is facing towards the camera.

$$\begin{aligned} C.\begin{bmatrix}0&0&1\end{bmatrix}^T < 0 \end{aligned}$$

(8)

$$\begin{aligned} N.\begin{bmatrix}0&0&1\end{bmatrix}^T > 0 \end{aligned}$$

(9)

By constructing a system of linear equations from (1) and (2), each giving three sets of equations, it requires at least two circle-ellipse correspondences to solve for the twelve unknown parameters of the rotation matrix and the translation vector.

2.2 Pose Estimation from Straight Line Correspondences

What follows summarizes the concepts in [13] that are used in the experiments reported in Sect. 3.2. In the world coordinate frame, assume the line L with the direction vector V and the point P on this line. The corresponding line l on the image plane has the direction vector v and a point on it, p, in the camera coordinate system as in Fig. 1(b). With the rotation matrix R and the translation vector t, the relationship between the two frames can be formulated as:

$$\begin{aligned} v = RV \end{aligned}$$

(10)

$$\begin{aligned} p = RP + t \end{aligned}$$

(11)

Considering the object line and its image on the image plane, we define a plane containing the two lines and optical center of the camera. Due to the coplanarity constraint, the normal vector of the plane is perpendicular to the direction vectors of the corresponding lines and the points on them. The normal vector to this plane is the cross product of v and p.

$$\begin{aligned} n.(RV) = 0 \end{aligned}$$

(12)

$$\begin{aligned} n.(RP + t) = 0 \end{aligned}$$

(13)

First, to solve for rotation, (12) is minimized by employing the nonlinear Trust-Region optimization iteratively. This algorithm searches for the local minima of an approximated function in a specific interval in each iteration and increases the interval when the local minima is less than the function value at the center of the interval. The search continues until it has converged to the global minimum [14]. Due to the rotation ambiguity, it may not converge to the optimum rotation in the first run. Since it is rapid in converging to the minima, we recommend using the Trust-Region method iteratively, and choosing the best estimate to find the global minimum. The best estimate is chosen based on the coordinate systems configurations. Having recovered the rotation, the translation vector is computed by solving the system of linear equations derived from (13). There are six degrees of freedom, three for the angles of rotation and three for the translation vector. Since each step is done separately, the pose estimation requires at least three line correspondences.

3 Experimental Results

An intrinsically calibrated Basler acA2040-25gm GigE color camera along with a Kowa lens is used in our experiments. The pixel size of the camera is $5.5\,\upmu \mathrm{m} \times 5.5\,\upmu \mathrm{m}$. The experiments are carried out on Windows 7 and with Matlab R2016a.

Feature Detection. Ellipses are detected with the method provided in [15] which finds the ellipse equation by minimizing the dual ellipse in a neighborhood of high gradient point candidates. Moreover, EDLines [16] is employed for straight line detection. This method detects straight line segments by evaluating edge segments based on the criteria that is explained in detail in [16].

3.1 Experiments Using Circles

This experiment includes estimating the pose of the camera using two circles attached to a surface with different radius values. The radius of the larger circle is 100 mm and the smaller one is 75 mm, see Fig. 2. The standard calibration approach in [17] using a checkerboard pattern is employed to serve as the ground truth pose. The resulting estimated pose is unsatisfactory as seen in Table 1, in comparison to the rotation and translation matrices recovered using the method in [17] with the checkerboard. Therefore, this method of pose estimation does not result in a reasonable and acceptable exterior orientation and position. In addition, the approach was supposed to enforce the orthogonality constraints on the rotation matrix, while we observe that the estimated transformation does not satisfy these constraints. A factor influencing this poor performance could be the normal vector and the center of the circles that are computed from the ellipse equations. These parameters cannot be validated since there is no information available for their actual value as ground truth. The authors in [1] have not presented an evaluation method for these parameters as well. We conducted other experiments on circles with various radius values and several orientations and positions; however, no improvement was noted.

Table 1. Estimated and ground truth [17] pose using the two circles.

Full size table

3.2 Experiments Using Straight Lines

This experiment is conducted with 21 different poses of the camera that are illustrated in Fig. 3(b). In order to evaluate the impact of the number of lines on the accuracy of the pose estimation, we performed the experiment with three different sets of lines, including 5, 10 and 15 to 25 lines (depending on the visibility of the detected lines in each image). Figure 4 depicts a 3D bunny reprojected on the image plane with the ground truth (Yellow bunny) and the estimated pose (Red bunny) in 3 different camera positions and orientations. Visual inspection of the reprojection reveals that the estimated pose is accurate at least qualitatively. In order to evaluate our results qualitatively, the ground truth pose was obtained with the calibration method of [17] using a checkerboard pattern for every single image. The reprojection error e is defined as the normalized sum of distances between the two reprojections of estimated and ground truth poses:

$$\begin{aligned} e = \frac{1}{n}\sum _{i=1}^{n} \sqrt{{(P_{i Est} - P_{i GT})}^2} \end{aligned}$$

(14)

where n is the number of points in the 3D point cloud. $P_{Est}$ indicates the reprojected points using the estimated pose and the reprojected points using the ground truth pose are expressed as $P_{GT}$. The average reprojection error is 0.0015px when using 5 or 10 lines, and 0.0016px when using 15 to 25 lines. In addition, it is believed that the slightly less accurate results of line set 3 may be caused by the fact that the level of noise increases while using more lines. The noise is due to the inaccuracy in the manual measurements or line detection in the image. The average execution time is 0.27 s for every line set, Fig. 5(b), signifying that this method is efficient with a quite consistent run time regardless of the number of lines or position of the camera.

4 Conclusion

In this paper, we investigated the methods of camera pose estimation using circle and line features. The experimental results on pose estimation from circle-ellipse correspondences demonstrate that this approach does not provide reliable solutions. However, in terms of straight lines, the small reprojection error between the estimated and the ground truth pose confirms that the presented approach is accurate, and its execution time is short. It is hypothesized that the method could be useful for AR applications. In addition, it is suitable for pose estimation from various number of lines. Finding correspondences between 2D and 3D information is one of the challenges that requires more investigation. An automatic algorithm to pair the correspondences will be helpful in real time applications and in video sequences as well. Furthermore, employing features that are more complicated (Such as combinations of intersecting lines or combinations of lines and circles) may improve the robustness of the estimation.

References

Chen, Q., Wu, H., Wada, T.: Camera calibration with two arbitrary coplanar circles. In: Pajdla, T., Matas, J. (eds.) ECCV 2004. LNCS, vol. 3023, pp. 521–532. Springer, Heidelberg (2004). https://doi.org/10.1007/978-3-540-24672-5_41
Chapter Google Scholar
Pagani, A., Kohler, J., Stricker, D.: Circular markers for camera pose estimation. In: Proceedings of the International Workshop on Image Analysis for Multimedia Interactive Services (2011)
Google Scholar
Forsyth, D., Mundy, J., Zisserman, A., Coelho, C., Heller, A., Rothwell, C.: Invariant descriptors for 3D object recognition and pose. IEEE Trans. Pattern Anal. Mach. Intell. 13(10), 971–991 (1991)
Article Google Scholar
Rothwell, C., Zisserman, A., Marinos, C., Forsyth, D., Mundy, J.: Relative motion and pose from arbitrary plane curves. Image Vis. Comput. 10(4), 250–262 (1992)
Article Google Scholar
Zheng, Y., Liu, Y.: The projective equation of a circle and its application in camera calibration. In: International Conference on Pattern Recognition, Tampa, FL (2008)
Google Scholar
Ansar, A., Daniilidis, K.: Linear pose estimation from points or lines. IEEE Pattern Anal. Mach. Intell. 25(5), 578–589 (2003)
Article Google Scholar
Pribyl, B., Zerncik, P., Cadik, M.: Absolute pose estimation from line correspondences using direct linear transformation. Comput. Vis. Image Underst. 161, 130–144 (2017)
Article Google Scholar
Ababsa, F., Mallem, M.: Robust camera pose estimation combining 2D/3D points and lines tracking. In: IEEE International Symposium on Industrial Electronics, Cambridge, UK (2008)
Google Scholar
Vakhitov, A., Funke, J., Moreno-Noguer, F.: Accurate and linear time pose estimation from points and lines. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9911, pp. 583–599. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46478-7_36
Chapter Google Scholar
Zhang, L., Xu, C., Lee, K.-M., Koch, R.: Robust and efficient pose estimation from line correspondences. In: Lee, K.M., Matsushita, Y., Rehg, J.M., Hu, Z. (eds.) ACCV 2012. LNCS, vol. 7726, pp. 217–230. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-37431-9_17
Chapter Google Scholar
Xu, C., Zhang, L., Cheng, L., Koch, R.: Pose estimation from line correspondences: a complete analysis and a series of solutions. IEEE Trans. Pattern Anal. Mach. Intell. 39(6), 1209–1222 (2017)
Article Google Scholar
Ji, Q., Costa, M., Haralick, R., Shapiro, L.: A robust linear least-squares estimation of camera exterior orientation using multiple geometric features. J. Photogramm. Remote Sens. 55(2), 75–93 (2000)
Article Google Scholar
Phong, T.Q., Horaud, R., Yassine, A., Tao, P.D.: Object pose from 2-D to 3-D point and line correspondences. Int. J. Comput. Vis. 15(3), 225–243 (1995)
Article Google Scholar
Absil, P.-A., Mahony, R., Sepulchre, R.: Optimization Algorithms on Matrix Manifolds. Princeton University Press, Princeton (2008)
Book Google Scholar
Ouellet, J.-N., Hebert, P.: A simple operator for very precise estimation of ellipses. In: Computer and Robot Vision (CRV), Montreal (2007)
Google Scholar
Akinlar, C., Topal, C.: EDLines: a real-time line segment detector with a false detection control. Pattern Recogn. Lett. 32, 1633–1642 (2011)
Article Google Scholar
Zhang, Z.: Flexible camera calibration by viewing a plane from unknown orientations. In: Proceedings of the Seventh IEEE International Conference on Computer Vision, Kerkyra (1999)
Google Scholar

Download references

Acknowledgments

This research was supported by NSERC and Bentley Systems.

Author information

Authors and Affiliations

Université Laval, Quebec City, QC, Canada
Shabnam Meshkat Alsadat & Denis Laurendeau

Authors

Shabnam Meshkat Alsadat
View author publications
You can also search for this author in PubMed Google Scholar
Denis Laurendeau
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Shabnam Meshkat Alsadat .

Editor information

Editors and Affiliations

Université de Bourgogne, Dijon, France
Alamin Mansouri
Université de Caen Normandie, Caen, France
Abderrahim El Moataz
Université du Québec à Trois-Rivières, Trois-Rivieres, Québec, Canada
Fathallah Nouboud
Université Ibn Zohr, Agadir, Morocco
Driss Mammass

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Meshkat Alsadat, S., Laurendeau, D. (2018). Analysis of Camera Pose Estimation Using 2D Scene Features for Augmented Reality Applications. In: Mansouri, A., El Moataz, A., Nouboud, F., Mammass, D. (eds) Image and Signal Processing. ICISP 2018. Lecture Notes in Computer Science(), vol 10884. Springer, Cham. https://doi.org/10.1007/978-3-319-94211-7_27

Download citation

DOI: https://doi.org/10.1007/978-3-319-94211-7_27
Published: 30 June 2018
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-94210-0
Online ISBN: 978-3-319-94211-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Societies and partnerships

The International Association for Pattern Recognition (opens in a new tab)