Modelling an arm's spatially correlated biases: an application to camera calibration for teleoperation
Introduction
While there are tasks in autonomous hand–eye coordination for which un-calibrated visual techniques are quite adequate (e.g. [1], [2]), vision processing for a teleoperation system has the rather different aim of recovering, transmitting and displaying a representation of the remote environs to a human operator who is attuned to looking at a Euclidean world. To achieve such a metric reconstruction, the cameras must first be calibrated. Using the teleoperated robot itself to generate the 3D positions for this task is an obvious possibility and one which offers several advantages. First, the robot frame and world frame are united, and there is no need separately to discover the transformation between them; second, there is no need to introduce calibration objects into the workspace; third, the robot defines and hence spans the working volume; and finally the method can be carried out automatically as a background task.
Unfortunately, a typical robot arm will not be able to point to perfect 3D positions, either absolutely or relatively, particularly when its movements span a large volume. Nor will the errors in the 3D positions (derived from the robot's joint angles via the forward kinematic transformation) have a simple error distribution. As discussed later, the nature of kinematic calibration of robot arms means that the positions suffer from spatially correlated bias.
The objective of this paper is to develop and apply a model of the error in the 3D points as a zero-mean Gaussian distribution with a covariance which depends on the proximity of points. If two points are close spatially, their errors are expected to be very similar, and, if far apart, to be un-correlated. The model involves specifying a spatial coherence length.
The two major classes of calibration techniques in the literature lie on either side of that required when a robot arm is available to point to the 3D positions in the world frame. To one side are the “test-field” calibration methods, of which the work of Tsai is representative [3], [4], and in which a pattern of perfectly known 3D points is placed in front of the camera to define the Euclidean world frame. On the other side are the auto- or self-calibration methods which assume that the 3D scene is unknown and infer both the scene positions and the camera calibration parameters from image correspondences. The most general methods have multiple solutions and are the least numerically stable [5], [6], [7], [8], [9]. A number of authors have demonstrated self-calibration under constraints of special motion or special scenes [10], [11], [12], [13], [14] and, considering the comparative accuracy of calibration methods, all urge the inclusion of prior information whenever it is available [15].
The approach developed here neither includes the 3D positions as absolute known quantities, nor attempts to recover them ab initio. Rather, values recovered from the robot's forward kinematics are included as measurements along with their corresponding image positions in each camera, and the 3D positions optimised in a bundle adjustment. Lavest et al. [16] take a broadly similar optimisation approach, but did not use a robot and were able to assume that all the measurements are corrupted by random noise.
Section 3 details the noise models used for image and world data, en route to a maximum likelihood estimation of the camera parameters described in Section 4. Sections 5 Implementation, 6 Experiments describe implementation and experimental work with a calibration tool attached to the end-effector of the Oxford teleoperation arm. The results are compared with camera parameters obtained using a conventional calibration grid. But first, Section 2 defines the problem domain and introduces the nomenclature and notation used in this paper.
Section snippets
Calibration with scene point adjustment
Without imposing priors, we wish to find an estimate of the set of camera parameters c ={cj}, 1 ≤ j ≤ J for perspective cameras arranged around the robot workspace, using measurements of a set of world points, {Xi}, 1 ≤ i ≤ I and their corresponding image projections {xi(j)}, related by the homogeneous projection equationwhere the projection matrix Pj is a function of cj for the jth camera, and the equality is only up to scale.
Specifying the world points in a Euclidean frame
Models of 2D and 3D measurement error
As just mentioned, both the image positions x of the 3D points, and the 3D points X themselves are used as measurements, the former found from the cameras and the latter derived from the robot arm's joint angles and forward kinematics. Between the sets of image and world measurements there is no correlation in the error, so that the overall covariance matrix is block diagonal
Optimisation procedure
To minimize the Mahalanobis distance in Eq. (3) we use a Levenberg–Marquardt (LM) method (see [20]), a restricted step optimisation method which circumvents possible difficulties caused by non-positive definite Hessian matrices in Newton's method. Rather than solving the normal equations of the Newton iterationfor the parameter innovation vector, Δ, each LM iteration pre-multiplies the on-diagonal elements of the curvature matrix [·] on the left by (1 + β). For the
Implementation
The camera calibration method has been implemented using the slave Stäubli Puma arm of the Oxford teleoperation system [21], [22]. Two practical issues of particular importance are, first, the accuracy of location of the end-effector in the images where the cameras have quite different viewpoints and, second, the robustness of location. The issues are not entirely separable, but the second is the more important. Outlying data resulting from gross mislocation of the end-effector can have a
Experiments
Experiments were conducted to examine the validity of the model proposed in this paper using three cameras arranged around the Oxford teleoperated arm. The cameras were first calibrated using a calibration grid to provide a baseline for comparison. The experiments proper used the slave robot arm to supply the calibration pattern, with the camera parameters and world positions being estimated using three different noise models.
Conclusions and discussion
This paper has noted that camera calibration using a robot arm must not assume that 3D positions pointed to by the arm are either accurate, or corrupted only by random noise. Instead, the inaccuracies arise from spatially correlated biases. The paper goes on to propose a model of the positional covariance of 3D points in terms of a coherence length. Below the coherence length, the positional errors are well correlated, but above the coherence length they can be regarded as random though with
Acknowledgements
This work was supported by Grants GR/L15005 and GR/N03266 from the UK Engineering and Physical Science Research Council. R.L.T. was supported by a Research Studentship, also from the EPSRC.
Richard Thompson was an undergraduate and graduate student of Trinity College, University of Oxford, and was awarded an MEng in engineering science with first class honours in 1997. He gained his doctorate in 2001 for work on the integration of visual and haptic feedback for operators of teleoperated robots. Since 2001, he has been working at the MathWorks, Cambridge, UK, as an engineering consultant, specialising in vision, robotics and autocoding for embedded systems.
References (22)
- et al.
Human–robot interface by pointing with uncalibrated stereo vision
Image and Vision Computing
(1996) Computing a nearest symmetric positive semidefinite matrix
Linear Algebra and Applications
(1988)- et al.
Robot hand–eye coordination based on stereo vision
IEEE Control System Magazine
(1995) A versatile camera calibration technique for high-accuracy 3D machine vision metrology using off-the-shelf TV cameras and lenses
IEEE Journal of Robotics and Automation
(1987)- et al.
A new technique for fully autonomous and efficient 3D robotics hand/eye calibration
IEEE Transactions on Robotics and Automation
(1989) - et al.
A theory of self-calibration of a moving camera
International Journal of Computer Vision
(1992) - et al.
Camera self-calibration: theory and experiments
Auto-calibration and the absolute quadric
- et al.
Euclidean reconstruction from constant intrinsic parameters
- et al.
Self-calibration and metric reconstruction in spite of varying and unknown internal camera parameters
Self-calibration of rotating and zooming cameras
International Journal of Computer Vision
Cited by (1)
Self-calibration algorithm for relationship of the eye-in-head in mobile robots
2011, Huazhong Keji Daxue Xuebao (Ziran Kexue Ban)/Journal of Huazhong University of Science and Technology (Natural Science Edition)
Richard Thompson was an undergraduate and graduate student of Trinity College, University of Oxford, and was awarded an MEng in engineering science with first class honours in 1997. He gained his doctorate in 2001 for work on the integration of visual and haptic feedback for operators of teleoperated robots. Since 2001, he has been working at the MathWorks, Cambridge, UK, as an engineering consultant, specialising in vision, robotics and autocoding for embedded systems.
Ron Daniel is a Professor of Engineering Science at the University of Oxford. He has taught there since 1982, first as a research fellow, then as a tutorial fellow at Brasenose College. With Mike Brady, he helped establish Oxford's Robotics Research Group, and was co-founder and, for a decade, co-editor in chief of the journal Mechatronics. His research interests include mechatronics, human–machine interaction, and telerobotics.
David Murray took his first degree and doctorate in physics at the University of Oxford in 1977 and 1980, specializing in low-energy nuclear physics. He was research fellow in physics at Caltech before joining GEC's research centre in London. He moved to the University of Oxford in 1989, where he is now a Professor of Engineering Science and tutorial fellow of St Anne's College. His research interests now centre on active approaches to visual sensing, with application to surveillance, navigation, and wearable computing.