Coarse-to-fine dot array marker detection with accurate edge localization for stereo visual tracking

https://doi.org/10.1016/j.bspc.2014.09.008Get rights and content

Highlights

  • We present a coarse-to-fine dot array marker detection algorithm which can extract dot features with high accuracy and low uncertainty.

  • Sub-pixel edge point localization of the dot contour is performed by searching the zero-crossing in the convolution with a Laplacian-of-Gaussian kernel.

  • The marker detection algorithm yielded a feature detection error of less than 0.1 pixel with real-time performance.

  • The uncertainties in both localizing static 2-D dot features and 3-D pose tracking were obviously reduced by performing the sub-pixel localization.

Abstract

We present a coarse-to-fine dot array marker detection algorithm which can extract dot features with high accuracy and low uncertainty. The contribution of this paper is twofold: one is a configurable dot array marker detection framework which enables real-time multi-marker tracking with compact marker size (coarse detection); the other is a closed-form sub-pixel edge localization method including the formulation and the implementation (fine localization). The marker pattern together with the dot contours is detected in a fast but coarse way for efficiency consideration, using simple thresholding and hierarchical contour analysis. If the marker pattern matches with one of predefined marker descriptors, sub-pixel edge point localization of the dot contour is performed within the detected marker region by searching the zero-crossing in the convolution of the marker image with a Laplacian-of-Gaussian (LoG) kernel. A closed-form solution is proposed to localize the “true” edge point in a 3 × 3 neighborhood of a candidate pixel by solving a quartic equation. The dot center is finally extracted by ellipse fitting and re-ordered according to an orientation indicator. The algorithm was evaluated against both synthetic and real image data, and also in real applications where stereo visual trackers were implemented using the proposed marker detection algorithm. Experimental results show that (1) the marker detection algorithm yielded a feature detection error of less than 0.1 pixel with real-time performance; (2) the uncertainties in both localizing static 2-D dot features and 3-D pose tracking were obviously reduced by performing the sub-pixel localization; and (3) the feasibility of the marker tracking under stereo laparoscopic views was confirmed in an in vivo animal experiment.

Introduction

Stereo visual tracking employs a calibrated stereo camera to localize a well-defined visual pattern (i.e., a visual marker) in real time with six degrees of freedom (DOF). The tracking algorithms follow a general framework: (1) detecting features of the visual marker in the stereo images and matching them over the image pair, (2) retrieving features’ 3-D information by triangulation, and (3) fitting the marker's geometry to obtain the rotation and translation. The salient features encoded in a visual marker are usually corners and edges. Therefore, a visual marker commonly consists of square or/and dot patterns. Compared with a square which can provide four corners, a dot contains less information (only its center). However, the dot center is retrieved by fitting (many) edge points on dot's contour (ellipse), and this can reduce the localization uncertainty incurred by image sensor noise. Of the visual markers, planar markers are most frequently used due to the ease of manufacture (printout), wide viewing angle, and minimum space occupation. By virtual of the current printing technology, a printed marker has enough geometric accuracy for most applications.

Other than stereo visual tracking, a visual marker can also be used for camera calibration [1], [2], (single camera) pose estimation [3] and some high level application such as augmented reality (AR) [4] and visual servoing [5]. Stereo visual tracking can be considered as a special case of pose estimation by introducing more constraints due to a second camera, which should be more accurate than single camera pose estimation under the same configuration. The essence of a visual marker in these computer vision tasks is to provide 3-D/2-D correspondences for solving a linear or nonlinear minimization problem. The first step relating to the visual marker is marker detection, i.e., extracting the encoded features and making them understandable. The accuracy and uncertainty of feature extraction will largely influence the final output according to error propagation. Take the stereo visual tracking as an example, 3-D triangulation error of a feature in the depth direction is sensitive to the feature disparity especially when a small baseline or the marker is far from the camera. The uncertainty of localizing 2-D features will be amplified through the triangulation procedure, resulting in large uncertainty in determining the depth. Assume we are using the stereo tracking results to visualize a tracking object in a virtual scene (e.g., in navigation systems), large tracking uncertainty leads to obvious random fluctuation of the displayed tracking object even if it actually remains static.

In the context of marker-based surgical instrument tracking under endoscopic view in minimally invasive surgery (MIS), a visual marker is attached to the surgical instrument (e.g., forceps manipulator [6] or endoscopic ultrasound probe [7], [8]) whose pose with respect to the endoscope inside the body is estimated by either PnP techniques (for a monocular endoscope) or stereo tracking (for a stereo endoscope). Limited by the small operative field inside the body, the visual marker is restricted to a small size, which will deteriorate the accuracy and increase the uncertainty at the target point (e.g., the instrument tip) far from the marker features [9], [10]. Additionally, the baseline of a stereo endoscope is typically small (several millimeters), which makes the triangulation procedure subject to image noise. Eventually, a small uncertainty in localizing feature points may cause relatively large uncertainty in the depth localization, which will further result in large angular uncertainties around the marker plane. One improvement of visual marker detection is to increase the accuracy and reduce the uncertainty of feature localization in the presence of noise. For the tracking task, the detection time is also an important issue which has to be considered.

There are several open source visual marker systems available. ARToolkit is an open source c/c++ library developed many years ago for building augmented reality (AR) applications by tracking a planar AR marker using pose estimation techniques [11]. The used AR marker is a planar pattern enclosed by a black rectangle frame on a white background. The corners of the rectangle are extracted by edge line fitting and used for pose estimation. The estimated pose is further used to overlay a virtual object on the video stream to create an AR scene. Inspired by ARToolkit, ARTag marker system was proposed in 2005 for AR applications, with improvement on the marker pattern to improve the false detection rate and inter-marker confusion rate [12]. ARToolkit and ARTag are the same in the way of pose estimation using extracted four corners but different in the way of carrying information for marker recognition. Zhang evaluated the performance of several AR marker systems and reported that the standard deviation of localizing a static feature point varied from 0.26 to 0.57 pixel [13]. Assume the error follows a normal distribution, the variation of locating the same feature point in a static state is more than 1 pixel due to the image sensor noise. The uncertainty of 1 pixel is quite large for both pose estimation and stereo tracking, especially when the marker is small and/or far from the camera.

The computer vision open source library OpenCV [14] utilizes a black/white chessboard pattern for camera calibration. The corners of the chessboard pattern (X corner) are detected at pixel level using quadrilateral detection, and then iteratively optimized toward the saddle point at sub-pixel level [15]. Other chessboard corner detection methods can also be found elsewhere [16], [17], [18], [19]. We noticed that more attentions have been paid to chessboard pattern detection. The reason may be the ease of detection and sub-pixel localization of an X corner. However, the uncertainty of the above sub-pixel corner localization is larger than that of dot center extraction using ellipse fitting. Because we can first localize the edge point on the dot's contour at sub-pixel level, then use an ellipse to fit these edge points and take the ellipse center as the dot center. The uncertainty of localizing the dot center is further reduced by ellipse fitting. OpenCV also supports dot grid feature detection. However, it detects the dot contour at pixel level which results in a relatively large uncertainty in localizing the dot center. Therefore, a compact marker and a corresponding detection algorithm with high accuracy and low uncertainty of feature localization are needed.

This paper presents a coarse-to-fine dot array marker detection algorithm with accurate edge point localization. The marker is detected in a cascade way for efficiency consideration. The contour of the dot is detected at pixel level in a fast way, and then the sub-pixel edge point localization is performed by searching the zero-crossing in the convolution of the image with a Laplacian-of-Gaussian (LoG) kernel [20]. The dot center is finally extracted by ellipse fitting and re-ordered according to an orientation indicator. The contribution of this paper is twofold: a compact configurable dot array marker detection framework which enables multi-marker tracking (coarse detection); and a closed-form sub-pixel edge localization method including the formulation and the implementation (fine localization). The sub-pixel edge localization method could also be used in relevant vision tasks [21], [22], [23].

Section snippets

Dot array marker

Our dot array marker is inspired by the camera calibration pattern used by the commercial machine vision software MVTec Halcon [24]. The dot array marker is a m × n dot matrix enclosed by a rectangle frame as shown in Fig. 1. The origin of the marker is chosen to be located at the center of the rectangle frame. A solid triangle at a corner serves as an orientation indicator distinguishing the x and y directions. A dot array marker hence is characterized by (m, n, d, c), where d is the spacing; c

Edge point localization

In the proposed marker detection algorithm, sub-pixel edge point localization is performed if the marker has been recognized. Because the marker recognition is a coarse procedure for time-saving consideration, we need to finely localize every edge point on the contour Ei for all i. In addition, disturbed by image sensor noise and motion artifacts, the contours extracted from a thresholded image are not true edges, however, they are supposed to be close to the true edges. Therefore, it is

Marker pose tracking

If the marker is successfully detected in a stereo image pair (left and right images), 3-D triangulation is performed to reconstruct 3-D coordinates of each dot center. Assume the stereo camera system has been calibrated and stereo-rectified so that it has a configuration of two parallel looking cameras with identical intrinsic parameter matrices:

K=f0cx0fcy001where f is the focal length and (cx, cy) is the principal point of the camera. The 3-D coordinates (x, y, z) of a 2-D correspondence (xl,

Experiments and results

We have implemented the dot array marker detection algorithm and the stereo visual tracker using c++ with the help of OpenCV. Except for the sub-pixel localization, all the operations in Algorithm 1 can be implemented by OpenCV functions. In this section, we evaluate the proposed algorithm on synthetic data and real data, respectively, and then show the real applications based on our proposed method.

Discussion and conclusion

In this paper, a coarse-to-fine marker detection algorithm with sub-pixel edge localization is presented. The dot array pattern is detected and matched to a predefined descriptor in a fast way using a simple threshold and hierarchical contour analysis. The resulting dot contours are coarse edge points which are supposed to be close to the “true” edges. If the marker has successfully matched with one of the predefined descriptors, the marker region (a bounding box containing only the marker

Conflict of interest

None.

References (37)

  • F. Zhou et al.

    Trends in augmented reality tracking, interaction and display: a review of ten years of Ismar

  • B. Espiau et al.

    A new approach to visual servoing in robotics

    IEEE Trans. Robot. Autom.

    (1992)
  • F. Nageotte et al.

    Visual servoing-based endoscopic path following for robot-assisted laparoscopic surgery

  • U. Jayarathne et al.

    Robust intraoperative US probe tracking using a monocular endoscopic camera

  • P. Pratt et al.

    Intraoperative ultrasound guidance for transanal endoscopic microsurgery

  • J. West et al.

    Designing optically tracked instruments for image-guided surgery

    IEEE Trans. Med. Imaging

    (2004)
  • J. Snchez-Margallo et al.

    Technical evaluation of a third generation optical pose tracker for motion analysis and image-guided surgery

  • H. Kato et al.

    Virtual object manipulation on a table-top AR environment

  • Cited by (17)

    • Robust and fast laparoscopic vision-based ultrasound probe tracking using a binary dot array marker

      2022, Computers in Biology and Medicine
      Citation Excerpt :

      Therefore, the orientation of the US probe (i.e. 2D US images) with respect to the laparoscopic camera can be obtained using the orientation of the fiducial marker and US probe with respect to the with laparoscopic camera and marker, respectively. Several fiducial markers have been proposed for vision-based tracking, for example, chessboard markers [9,10], 3D random X-corner markers [11,12], dot array markers [13,14], and hybrid cylindrical markers [15]. The robustness of the chessboard and dot array markers is limited because they cannot be detected and identified when occlusions exist on the markers.

    • Real-time robust individual X point localization for stereoscopic tracking

      2018, Pattern Recognition Letters
      Citation Excerpt :

      We believe an efficient and robust x point localization framework is helpful to customizing task-specific stereoscopic trackers and is important for the community. For example, with the proposed method, a stereo laparoscopic tracker [23] could be implemented to track forceps in minimally invasive surgery (MIS). In addition, by integrating a miniature projector which projects x point features into a stereo laparoscope, it is possible to reconstruct the organ surface three dimensionally intraoperatively [4].

    • Robust, fast and accurate vision-based localization of a cooperative target used for space robotic arm

      2017, Acta Astronautica
      Citation Excerpt :

      Ref. [23] utilized a marker that consists of concentric contrasting circles to estimate the 12 Degrees of Freedom relative state for small inspection spacecrafts. Ref. [24] presented a coarse-to-fine dot array marker tracking method and implemented it in a vivo animal experiment. Ref. [25] implemented fiducial markers around a lung tumor for dynamic tumor tracking.

    • Efficient intraoral photogrammetry using self-identifying projective invariant marker

      2024, International Journal of Computer Assisted Radiology and Surgery
    View all citing articles on Scopus
    View full text