Keywords

1 Introduction

The principal task in AR is to localize the camera with respect to the scene that is being viewed and to superimpose virtual content from the estimated camera location. Positioning a camera requires tracking the features in the captured images. Based on the position of the features in the real environment and in the image, the exterior pose of the camera can be obtained and augmentation can be achieved. Camera pose estimation using two common geometrical primitives, namely circles and lines, is investigated in this work.

Some research works has been dedicated to the extrinsic camera calibration from circle-ellipse correspondences. The relationship between the normal vector and the center of a detected ellipse, and those of the world circle is the main idea in [1]. It considers a single view of two coplanar circles with unknown radii. However, this approach suffers from the ambiguity of solutions. In [2], it is attempted to handle the ambiguity by using a marker with a specific design to distinguish the correct pose from the set of possible solutions. Using equiform transformations, [3, 4] provide almost the same approach with two coplanar conics. The pose they estimate requires to be improved by an iterative process though. Another approach with two coplanar circles with known radii is presented in [5]. It aims for exterior calibration in closed-form based on the pole-polar constraints of the image of the absolute conic of both circles and is reported to be useful as an initialization for other iterative methods.

There are a large number of methods based on the geometrical characteristics of lines, particularly the coplanarity of the line on the image plane and the line in the world frame such as [6, 7]. For instance, [8, 13] solve for camera pose considering the fact that both of the lines in 2D and 3D must be perpendicular to the normal of the plane containing these lines and the optical center. In addition, the approaches of OPnPL and EPnPL presented in [9] are based on their point-only counterparts to estimate the pose of the camera using lines as well. The noniterative method of RPnL [10] defines an intermediate frame between the object and the camera frame to reduce the complexity of the algorithm. Furthermore, ASPnL [11], inspired by RPnL [10], decreases the complexity even more to be almost linear to the number of lines by adding a second intermediate frame. Although ASPnL [11] outperforms the above mentioned methods in terms of accuracy, it is highly sensitive to noise and outliers in large line sets.

2 Basic Concept of Camera Pose Estimation Using Circle and Line Primitives

2.1 Pose Estimation from Circle-Ellipse Correspondences

What follows in this section summarizes the concepts in [1, 12] that are used in the experiments reported in Sect. 3.1. Consider a circle with known center coordinates and normal vector to its supporting plane as \(C_w\) and \(N_w\) respectively. In perspective projection, the image of this circle on the image plane will be an ellipse in a general configuration. The rotation matrix \(R\) and the translation vector \(t\) are calculated as:

$$\begin{aligned} N_c = RN_w \end{aligned}$$
(1)
$$\begin{aligned} C_c = RC_x + t \end{aligned}$$
(2)

where \(C_c\) is the center and \(N_c\) is the normal vector to the supporting plane of the resulting circle in the camera coordinate frame [12]. Primarily, the normal vector and the center of the image circle are computed from the ellipse equation in matrix form (3) [1].

$$\begin{aligned} x^T \begin{bmatrix} A&B&D \\ B&C&E \\ D&E&F \\ \end{bmatrix} x = 0 \end{aligned}$$
(3)

Let the ellipse be the base of an oblique elliptical cone and the optical center be its apex, see Fig. 1(a). With the image plane placed at \(z = f\), and \(f\) being the focal length, every point on the oblique elliptical cone can be expressed as:

$$\begin{aligned} P^T = k(x_e,y_e,f)^T \end{aligned}$$
(4)

where \(k\) is a scale factor indicating the distance of each point of the cone to the origin of the camera frame. From (3) and (4), the oblique elliptical cone is:

$$\begin{aligned} Q = \begin{bmatrix} A&B&\frac{D}{f} \\ B&C&\frac{E}{f} \\ \frac{D}{f}&\frac{E}{f}&\frac{F}{f^2} \\ \end{bmatrix} \end{aligned}$$
(5)

By decomposing \(Q\), we obtain the eigenvalues \(D = diag(\lambda _{1},\lambda _{2},\lambda _{3})\) and the eigenvectors \(V = (V_{1}^T,V_{2}^T,V_{3}^T)\) of the oblique elliptical cone. Considering the orthogonality constraint of the rotation matrix, the normal vector and the center of the circle in the camera coordinate frame are calculated as follows.

(6)
(7)

\(S_1,S_2,S_3\) are either \(+1\) or \(-1\), giving eight possible set of solutions for N and C. Parameter r is the radius of the circle. Based on the coordinate system configurations, the incorrect solutions are eliminated under conditions in (8) and (9). These conditions ensure that the circle is facing towards the camera.

$$\begin{aligned} C.\begin{bmatrix}0&0&1\end{bmatrix}^T < 0 \end{aligned}$$
(8)
$$\begin{aligned} N.\begin{bmatrix}0&0&1\end{bmatrix}^T > 0 \end{aligned}$$
(9)

By constructing a system of linear equations from (1) and (2), each giving three sets of equations, it requires at least two circle-ellipse correspondences to solve for the twelve unknown parameters of the rotation matrix and the translation vector.

2.2 Pose Estimation from Straight Line Correspondences

What follows summarizes the concepts in [13] that are used in the experiments reported in Sect. 3.2. In the world coordinate frame, assume the line L with the direction vector V and the point P on this line. The corresponding line l on the image plane has the direction vector v and a point on it, p, in the camera coordinate system as in Fig. 1(b). With the rotation matrix R and the translation vector t, the relationship between the two frames can be formulated as:

$$\begin{aligned} v = RV \end{aligned}$$
(10)
$$\begin{aligned} p = RP + t \end{aligned}$$
(11)

Considering the object line and its image on the image plane, we define a plane containing the two lines and optical center of the camera. Due to the coplanarity constraint, the normal vector of the plane is perpendicular to the direction vectors of the corresponding lines and the points on them. The normal vector to this plane is the cross product of v and p.

$$\begin{aligned} n.(RV) = 0 \end{aligned}$$
(12)
$$\begin{aligned} n.(RP + t) = 0 \end{aligned}$$
(13)

First, to solve for rotation, (12) is minimized by employing the nonlinear Trust-Region optimization iteratively. This algorithm searches for the local minima of an approximated function in a specific interval in each iteration and increases the interval when the local minima is less than the function value at the center of the interval. The search continues until it has converged to the global minimum [14]. Due to the rotation ambiguity, it may not converge to the optimum rotation in the first run. Since it is rapid in converging to the minima, we recommend using the Trust-Region method iteratively, and choosing the best estimate to find the global minimum. The best estimate is chosen based on the coordinate systems configurations. Having recovered the rotation, the translation vector is computed by solving the system of linear equations derived from (13). There are six degrees of freedom, three for the angles of rotation and three for the translation vector. Since each step is done separately, the pose estimation requires at least three line correspondences.

Fig. 1.
figure 1

Circle-ellipse (a) and straight line (b) correspondences

3 Experimental Results

An intrinsically calibrated Basler acA2040-25gm GigE color camera along with a Kowa lens is used in our experiments. The pixel size of the camera is \(5.5\,\upmu \mathrm{m} \times 5.5\,\upmu \mathrm{m}\). The experiments are carried out on Windows 7 and with Matlab R2016a.

Feature Detection. Ellipses are detected with the method provided in [15] which finds the ellipse equation by minimizing the dual ellipse in a neighborhood of high gradient point candidates. Moreover, EDLines [16] is employed for straight line detection. This method detects straight line segments by evaluating edge segments based on the criteria that is explained in detail in [16].

3.1 Experiments Using Circles

This experiment includes estimating the pose of the camera using two circles attached to a surface with different radius values. The radius of the larger circle is 100 mm and the smaller one is 75 mm, see Fig. 2. The standard calibration approach in [17] using a checkerboard pattern is employed to serve as the ground truth pose. The resulting estimated pose is unsatisfactory as seen in Table 1, in comparison to the rotation and translation matrices recovered using the method in [17] with the checkerboard. Therefore, this method of pose estimation does not result in a reasonable and acceptable exterior orientation and position. In addition, the approach was supposed to enforce the orthogonality constraints on the rotation matrix, while we observe that the estimated transformation does not satisfy these constraints. A factor influencing this poor performance could be the normal vector and the center of the circles that are computed from the ellipse equations. These parameters cannot be validated since there is no information available for their actual value as ground truth. The authors in [1] have not presented an evaluation method for these parameters as well. We conducted other experiments on circles with various radius values and several orientations and positions; however, no improvement was noted.

Table 1. Estimated and ground truth [17] pose using the two circles.
Fig. 2.
figure 2

Experiment with two circles. Detected circles marked in red. (Color figure online)

Fig. 3.
figure 3

Setup showing a metallic frame made of straight lines that are used for pose estimation (a). The camera poses with respect to the object reference frame (b).

Fig. 4.
figure 4

Reprojected bunny in 3 different camera positions and orientations using the estimated pose in red and using the ground truth pose in yellow along with a closer look below each image. (Color figure online)

3.2 Experiments Using Straight Lines

This experiment is conducted with 21 different poses of the camera that are illustrated in Fig. 3(b). In order to evaluate the impact of the number of lines on the accuracy of the pose estimation, we performed the experiment with three different sets of lines, including 5, 10 and 15 to 25 lines (depending on the visibility of the detected lines in each image). Figure 4 depicts a 3D bunny reprojected on the image plane with the ground truth (Yellow bunny) and the estimated pose (Red bunny) in 3 different camera positions and orientations. Visual inspection of the reprojection reveals that the estimated pose is accurate at least qualitatively. In order to evaluate our results qualitatively, the ground truth pose was obtained with the calibration method of [17] using a checkerboard pattern for every single image. The reprojection error e is defined as the normalized sum of distances between the two reprojections of estimated and ground truth poses:

$$\begin{aligned} e = \frac{1}{n}\sum _{i=1}^{n} \sqrt{{(P_{i Est} - P_{i GT})}^2} \end{aligned}$$
(14)

where n is the number of points in the 3D point cloud. \(P_{Est}\) indicates the reprojected points using the estimated pose and the reprojected points using the ground truth pose are expressed as \(P_{GT}\). The average reprojection error is 0.0015px when using 5 or 10 lines, and 0.0016px when using 15 to 25 lines. In addition, it is believed that the slightly less accurate results of line set 3 may be caused by the fact that the level of noise increases while using more lines. The noise is due to the inaccuracy in the manual measurements or line detection in the image. The average execution time is 0.27 s for every line set, Fig. 5(b), signifying that this method is efficient with a quite consistent run time regardless of the number of lines or position of the camera.

Fig. 5.
figure 5

Reprojection error (a) and execution time (b) comparison of three line sets.

4 Conclusion

In this paper, we investigated the methods of camera pose estimation using circle and line features. The experimental results on pose estimation from circle-ellipse correspondences demonstrate that this approach does not provide reliable solutions. However, in terms of straight lines, the small reprojection error between the estimated and the ground truth pose confirms that the presented approach is accurate, and its execution time is short. It is hypothesized that the method could be useful for AR applications. In addition, it is suitable for pose estimation from various number of lines. Finding correspondences between 2D and 3D information is one of the challenges that requires more investigation. An automatic algorithm to pair the correspondences will be helpful in real time applications and in video sequences as well. Furthermore, employing features that are more complicated (Such as combinations of intersecting lines or combinations of lines and circles) may improve the robustness of the estimation.