Coarse-to-fine dot array marker detection with accurate edge localization for stereo visual tracking
Introduction
Stereo visual tracking employs a calibrated stereo camera to localize a well-defined visual pattern (i.e., a visual marker) in real time with six degrees of freedom (DOF). The tracking algorithms follow a general framework: (1) detecting features of the visual marker in the stereo images and matching them over the image pair, (2) retrieving features’ 3-D information by triangulation, and (3) fitting the marker's geometry to obtain the rotation and translation. The salient features encoded in a visual marker are usually corners and edges. Therefore, a visual marker commonly consists of square or/and dot patterns. Compared with a square which can provide four corners, a dot contains less information (only its center). However, the dot center is retrieved by fitting (many) edge points on dot's contour (ellipse), and this can reduce the localization uncertainty incurred by image sensor noise. Of the visual markers, planar markers are most frequently used due to the ease of manufacture (printout), wide viewing angle, and minimum space occupation. By virtual of the current printing technology, a printed marker has enough geometric accuracy for most applications.
Other than stereo visual tracking, a visual marker can also be used for camera calibration [1], [2], (single camera) pose estimation [3] and some high level application such as augmented reality (AR) [4] and visual servoing [5]. Stereo visual tracking can be considered as a special case of pose estimation by introducing more constraints due to a second camera, which should be more accurate than single camera pose estimation under the same configuration. The essence of a visual marker in these computer vision tasks is to provide 3-D/2-D correspondences for solving a linear or nonlinear minimization problem. The first step relating to the visual marker is marker detection, i.e., extracting the encoded features and making them understandable. The accuracy and uncertainty of feature extraction will largely influence the final output according to error propagation. Take the stereo visual tracking as an example, 3-D triangulation error of a feature in the depth direction is sensitive to the feature disparity especially when a small baseline or the marker is far from the camera. The uncertainty of localizing 2-D features will be amplified through the triangulation procedure, resulting in large uncertainty in determining the depth. Assume we are using the stereo tracking results to visualize a tracking object in a virtual scene (e.g., in navigation systems), large tracking uncertainty leads to obvious random fluctuation of the displayed tracking object even if it actually remains static.
In the context of marker-based surgical instrument tracking under endoscopic view in minimally invasive surgery (MIS), a visual marker is attached to the surgical instrument (e.g., forceps manipulator [6] or endoscopic ultrasound probe [7], [8]) whose pose with respect to the endoscope inside the body is estimated by either PnP techniques (for a monocular endoscope) or stereo tracking (for a stereo endoscope). Limited by the small operative field inside the body, the visual marker is restricted to a small size, which will deteriorate the accuracy and increase the uncertainty at the target point (e.g., the instrument tip) far from the marker features [9], [10]. Additionally, the baseline of a stereo endoscope is typically small (several millimeters), which makes the triangulation procedure subject to image noise. Eventually, a small uncertainty in localizing feature points may cause relatively large uncertainty in the depth localization, which will further result in large angular uncertainties around the marker plane. One improvement of visual marker detection is to increase the accuracy and reduce the uncertainty of feature localization in the presence of noise. For the tracking task, the detection time is also an important issue which has to be considered.
There are several open source visual marker systems available. ARToolkit is an open source c/c++ library developed many years ago for building augmented reality (AR) applications by tracking a planar AR marker using pose estimation techniques [11]. The used AR marker is a planar pattern enclosed by a black rectangle frame on a white background. The corners of the rectangle are extracted by edge line fitting and used for pose estimation. The estimated pose is further used to overlay a virtual object on the video stream to create an AR scene. Inspired by ARToolkit, ARTag marker system was proposed in 2005 for AR applications, with improvement on the marker pattern to improve the false detection rate and inter-marker confusion rate [12]. ARToolkit and ARTag are the same in the way of pose estimation using extracted four corners but different in the way of carrying information for marker recognition. Zhang evaluated the performance of several AR marker systems and reported that the standard deviation of localizing a static feature point varied from 0.26 to 0.57 pixel [13]. Assume the error follows a normal distribution, the variation of locating the same feature point in a static state is more than 1 pixel due to the image sensor noise. The uncertainty of 1 pixel is quite large for both pose estimation and stereo tracking, especially when the marker is small and/or far from the camera.
The computer vision open source library OpenCV [14] utilizes a black/white chessboard pattern for camera calibration. The corners of the chessboard pattern (X corner) are detected at pixel level using quadrilateral detection, and then iteratively optimized toward the saddle point at sub-pixel level [15]. Other chessboard corner detection methods can also be found elsewhere [16], [17], [18], [19]. We noticed that more attentions have been paid to chessboard pattern detection. The reason may be the ease of detection and sub-pixel localization of an X corner. However, the uncertainty of the above sub-pixel corner localization is larger than that of dot center extraction using ellipse fitting. Because we can first localize the edge point on the dot's contour at sub-pixel level, then use an ellipse to fit these edge points and take the ellipse center as the dot center. The uncertainty of localizing the dot center is further reduced by ellipse fitting. OpenCV also supports dot grid feature detection. However, it detects the dot contour at pixel level which results in a relatively large uncertainty in localizing the dot center. Therefore, a compact marker and a corresponding detection algorithm with high accuracy and low uncertainty of feature localization are needed.
This paper presents a coarse-to-fine dot array marker detection algorithm with accurate edge point localization. The marker is detected in a cascade way for efficiency consideration. The contour of the dot is detected at pixel level in a fast way, and then the sub-pixel edge point localization is performed by searching the zero-crossing in the convolution of the image with a Laplacian-of-Gaussian (LoG) kernel [20]. The dot center is finally extracted by ellipse fitting and re-ordered according to an orientation indicator. The contribution of this paper is twofold: a compact configurable dot array marker detection framework which enables multi-marker tracking (coarse detection); and a closed-form sub-pixel edge localization method including the formulation and the implementation (fine localization). The sub-pixel edge localization method could also be used in relevant vision tasks [21], [22], [23].
Section snippets
Dot array marker
Our dot array marker is inspired by the camera calibration pattern used by the commercial machine vision software MVTec Halcon [24]. The dot array marker is a m × n dot matrix enclosed by a rectangle frame as shown in Fig. 1. The origin of the marker is chosen to be located at the center of the rectangle frame. A solid triangle at a corner serves as an orientation indicator distinguishing the x and y directions. A dot array marker hence is characterized by (m, n, d, c), where d is the spacing; c
Edge point localization
In the proposed marker detection algorithm, sub-pixel edge point localization is performed if the marker has been recognized. Because the marker recognition is a coarse procedure for time-saving consideration, we need to finely localize every edge point on the contour Ei for all i. In addition, disturbed by image sensor noise and motion artifacts, the contours extracted from a thresholded image are not true edges, however, they are supposed to be close to the true edges. Therefore, it is
Marker pose tracking
If the marker is successfully detected in a stereo image pair (left and right images), 3-D triangulation is performed to reconstruct 3-D coordinates of each dot center. Assume the stereo camera system has been calibrated and stereo-rectified so that it has a configuration of two parallel looking cameras with identical intrinsic parameter matrices:
where f is the focal length and (cx, cy) is the principal point of the camera. The 3-D coordinates (x, y, z) of a 2-D correspondence (xl,
Experiments and results
We have implemented the dot array marker detection algorithm and the stereo visual tracker using c++ with the help of OpenCV. Except for the sub-pixel localization, all the operations in Algorithm 1 can be implemented by OpenCV functions. In this section, we evaluate the proposed algorithm on synthetic data and real data, respectively, and then show the real applications based on our proposed method.
Discussion and conclusion
In this paper, a coarse-to-fine marker detection algorithm with sub-pixel edge localization is presented. The dot array pattern is detected and matched to a predefined descriptor in a fast way using a simple threshold and hierarchical contour analysis. The resulting dot contours are coarse edge points which are supposed to be close to the “true” edges. If the marker has successfully matched with one of the predefined descriptors, the marker region (a bounding box containing only the marker
Conflict of interest
None.
References (37)
- et al.
Chess – quick and robust detection of chess-board features
Comput. Vis. Image Understand.
(2014) - et al.
Efficient tracking and ego-motion recovery using gait analysis
Signal Process.
(2009) - et al.
A sparse representation based fast detection method for surface defect detection of bottle caps
Neurocomputing
(2014) - et al.
Topological structural analysis of digitized binary images by border following
Comput. Vis. Graph. Image Process.
(1985) - et al.
Sub-pixel edge detection based on an improved moment
Image Vis. Comput.
(2010) Accuracy of Laplacian edge detectors
Comput. Vis. Graph. Image Process.
(1984)- et al.
Performance of three recursive algorithms for fast space-variant Gaussian filtering
Real-Time Imaging
(2003) A flexible new technique for camera calibration
IEEE Trans. Pattern Anal. Mach. Intell.
(2000)- et al.
Videoendoscopic distortion correction and its application to virtual guidance of endoscopy
IEEE Trans. Med. Imaging
(2001) - et al.
EPnP: an accurate O(n) solution to the PNP problem
Int. J. Comput. Vis.
(2009)
Trends in augmented reality tracking, interaction and display: a review of ten years of Ismar
A new approach to visual servoing in robotics
IEEE Trans. Robot. Autom.
Visual servoing-based endoscopic path following for robot-assisted laparoscopic surgery
Robust intraoperative US probe tracking using a monocular endoscopic camera
Intraoperative ultrasound guidance for transanal endoscopic microsurgery
Designing optically tracked instruments for image-guided surgery
IEEE Trans. Med. Imaging
Technical evaluation of a third generation optical pose tracker for motion analysis and image-guided surgery
Virtual object manipulation on a table-top AR environment
Cited by (17)
A closed-loop minimally invasive 3D printing strategy with robust trocar identification and adaptive alignment
2023, Additive ManufacturingRobust and fast laparoscopic vision-based ultrasound probe tracking using a binary dot array marker
2022, Computers in Biology and MedicineCitation Excerpt :Therefore, the orientation of the US probe (i.e. 2D US images) with respect to the laparoscopic camera can be obtained using the orientation of the fiducial marker and US probe with respect to the with laparoscopic camera and marker, respectively. Several fiducial markers have been proposed for vision-based tracking, for example, chessboard markers [9,10], 3D random X-corner markers [11,12], dot array markers [13,14], and hybrid cylindrical markers [15]. The robustness of the chessboard and dot array markers is limited because they cannot be detected and identified when occlusions exist on the markers.
Real-time robust individual X point localization for stereoscopic tracking
2018, Pattern Recognition LettersCitation Excerpt :We believe an efficient and robust x point localization framework is helpful to customizing task-specific stereoscopic trackers and is important for the community. For example, with the proposed method, a stereo laparoscopic tracker [23] could be implemented to track forceps in minimally invasive surgery (MIS). In addition, by integrating a miniature projector which projects x point features into a stereo laparoscope, it is possible to reconstruct the organ surface three dimensionally intraoperatively [4].
Robust, fast and accurate vision-based localization of a cooperative target used for space robotic arm
2017, Acta AstronauticaCitation Excerpt :Ref. [23] utilized a marker that consists of concentric contrasting circles to estimate the 12 Degrees of Freedom relative state for small inspection spacecrafts. Ref. [24] presented a coarse-to-fine dot array marker tracking method and implemented it in a vivo animal experiment. Ref. [25] implemented fiducial markers around a lung tumor for dynamic tumor tracking.
Generation of micro-scale finite element models from synchrotron X-ray CT images for multidirectional carbon fibre reinforced composites
2016, Composites Part A: Applied Science and ManufacturingEfficient intraoral photogrammetry using self-identifying projective invariant marker
2024, International Journal of Computer Assisted Radiology and Surgery