Automatic calibration of camera sensor networks based on 3D texture map information

doi:10.1016/j.robot.2016.09.015

Robotics and Autonomous Systems

Volume 87, January 2017, Pages 313-328

https://doi.org/10.1016/j.robot.2016.09.015 Get rights and content

Highlights

•
Automatic calibration of complete 6DOF camera parameters using only 3D texture map information.
•
A novel image descriptor based on Quantized Line parameters in the Hough space (QLH) is proposed for 2D–3D matching-based automatic calibration.
•
Global estimation of the 6DOF camera parameters without initial conditions can be performed by applying particle filter-based optimization scheme.

Abstract

To construct an intelligent space with a distributed camera sensor network, pre-calibration of all cameras (i.e., determining the absolute poses of each camera) is an essential task that is extremely tedious. This paper considers the automatic calibration method for camera sensor networks based on 3D texture map information of a given environment. In other words, this paper solves a global localization problem for the poses of the camera sensor networks given the 3D texture map information. To manage the complete calibration problem, we propose a novel image descriptor based on quantized line parameters in the Hough space (QLH) to perform a particle filter-based matching process between line features extracted from both a distributed camera image and the 3D texture map information. We evaluate the proposed method in a simulation environment with a virtual camera network and in a real environment with a wireless camera sensor network. The results demonstrate that the proposed system can calibrate complete external camera parameters successfully.

Introduction

Distributed sensor networks installed in external environments can recognize various events that occur in the space, so that such space can be of much service in human–robot coexistence environments, as shown in Fig. 1. In recent years, many studies on an intelligent space, have been performed [1], [2]. Distributed camera sensor networks with multi-camera systems provide the most general infrastructure for constructing such intelligent space. In order to obtain reliable information from such a system, pre-calibration of all the cameras in the environment (i.e., determining the absolute positions and orientations of each camera) is an essential task that is extremely tedious. In this respect, several studies that provide Bayesian filter-based probabilistic estimates of optimal sensor parameters and target tracks have been conducted. Foxlin proposed the simultaneous localization and auto-calibration (SLAC) concept, which is a very general architectural framework for navigation and tracking systems with environment sensors [3]. Taylor et al. also implemented a simultaneous localization, calibration, and tracking (SLAT) system using radio and ultrasound pulse-based range sensors as environment sensors [4]. However, these methods can only be applied with range and bearing sensors and cannot make use of information from the camera sensor network, which is a popular network system for the intelligent space.

Chen et al. employed an approach that optimizes robot motion to minimize camera calibration errors; however, their approach assumes that the robot motion has no uncertainty, and rough parameters of the camera should be initialized by human observation [5]. Proposals by Rahimi et al. and Funiak et al. recovered the most likely camera poses and the target trajectory given in a sequence of observations from the camera network [6], [7]. However, these approaches have limitations in that they can only estimate 3 degree of freedoms (DOF) poses $(x, y, ϕ)$ with a restrictive assumption that requires aligning each camera’s ground-plane coordinate system with a global ground-plane coordinate system. The optimization problem of including orientation parameters for all axes $(ψ, θ, ϕ)$ could have myriad local minimum solutions without additional constraints because many indistinguishable observations can exist even if the camera poses are different. For the reasons mentioned above, there has been no previous research that tried to estimate complete 6DOF camera parameters.

To overcome this limitation, our research group proposed a novel calibration method that estimates the complete 6DOF poses $(x, y, z, ψ, θ, ϕ)$ for camera sensor networks by applying additional constraints [8]. The additional constraints consist of terms related to grid map information and a two-way observation model based on an assumption that the camera and target can observe each other. However, for this approach, a mobile agent is essential for implementing the calibration methods, so that these methods cannot be applied where the mobile agent cannot be used. In this respect, we propose another approach to achieve a complete calibration scheme for 6DOF external parameters that does not use any mobile agent, but instead, uses only the environmental map information to accommodate situations in which the mobile agent cannot be used. Therefore, the proposed complete 6DOF calibration system is able to construct a camera network system in arbitrary poses in the environment and easily calibrate its parameters under the assumption that there are no large structural alterations to a building (i.e., no wide discrepancies between the real environment and the map information).

In this approach, OctoMap, which is widely used to manage dense 3D environment models with texture information is utilized [9], [10]. The OctoMap divides the environment into irregular voxels that are managed in an Octree structure, so that it leads to very efficient memory management compared with a point-based structure. As shown in Fig. 2, the 3D texture map information can be utilized to generate virtual 2D images from arbitrary viewpoints (i.e., arbitrary 6DOF camera poses) using 3D projective geometry when the internal camera parameters are known; thus, the external camera parameters (i.e., 6DOF pose) can be determined by matching the virtual images generated at every viewpoint with real images from the camera sensor networks.

The problem addressed in this study can be considered to be similar to image-based self-localization problems of mobile robots, a dilemma that has been extensively studied over the past few years [11], [12], [13], [14], [15]. However, the solutions to these problems cannot be applied to complete 6DOF estimation problems but to only position estimation (or 3DOF estimation) problems, because the motion of mobile robots is expressed by 3DOF in 2D space. Furthermore, these methods can only be applied using omni-directional cameras to efficiently acquire surrounding information of a large environment and cannot make use of information from normal cameras. In particular, the method proposed by Ishizuka et al. achieved self-localization of mobile robots by matching 2D edge points observed from an omni-directional camera to 3D edge points obtained from the 3D environment model [15]. The equivalent method used in this study is called a 2D–3D edge matching scheme and several 6DOF registration techniques for a 3D geometric model (i.e., 3D map information) and 2D image data also have been previously proposed [16], [17], [18], [19], [20]. Most of these registration methods are based on the correspondence of 2D photometric edges and projected 3D geometrical edges on the 2D image plane. However, it is difficult to find corresponding edges correctly because robustly extracted edges are limited. Moreover, the initial pose should be manually set close to the correct pose to avoid being stuck in local minimums. To overcome the limitations of setting initial registration, Hara et al. proposed a new registration algorithm that can estimate an optimal pose robustly against initial registration errors [21]. However, it was not completely free from the initial set. The allowed maximum error of the initial registration is 2 m and 20 degrees for each axis. In conclusion, these registration schemes can be applied to matching 3D texture map information with 2D image data; however, initial registration by human observation should be performed and a global search algorithm (i.e., seeking a 6DOF solution in a global space) is yet to be established. Realizing the global search of the 6DOF solution with no strong constraints is impossible because of myriad local minimum solutions; thus, there has been no previous study that attempted this kind of approach. Here, the map information is very useful for reducing the solution space (i.e., the searching space for the 6DOF camera poses) given that the cameras are generally installed on the occupied region, such as interior walls, because of space limitations.

The contributions of this paper are as follows. The limitations of the early approaches are that they can estimate only 3DOF parameters $(x, y, ϕ)$ with restrictive assumptions and a mobile agent is needed for similar calibration patterns, as mentioned above. On the other hand, the proposed complete 6DOF calibration system in this paper only uses the environment map information; therefore, the proposed scheme easily calibrates its parameters without any mobile agent. Moreover, because we apply a novel matching scheme with a line feature-based descriptor that can manage some of the occlusions and clutter, the proposed calibration framework is relatively robust to illumination changes and also manages a few changes in the environment (i.e., discrepancies between the 3D map information and the camera image data) compared with the color information-based approach. In addition, the proposed calibration system requires no initial conditions because the particle filter-based approach, which is adapted for main paradigm for the proposed calibration task, has the ability to solve the global estimation problem, in comparison with previous local estimation scheme [21].

The remainder of this paper is organized as follows. Section 2 presents overview of the proposed calibration process based on 3D texture map information. Section 3 describes the parameterization step which converts 3D texture map information to simpler representation in detail. A novel image descriptor for a fast and accurate line-based matching process is presented in Section 4. Section 5 presents the particle filter-based parameter calibration step. The effectiveness of the proposed calibration scheme is evaluated with the experiment results in Section 6. Finally, Section 7 gives conclusions of this paper.

Section snippets

Overview of proposed calibration process

We can take maximum likelihood (ML) or maximum a posterior (MAP) estimation methods into consideration to find the optimal camera pose $w^{*}$ , as follows: $w^{*} = {arg max}_{w} [p (w | I_{R}, M_{o c t})], = {arg min}_{w} [p (I_{R} | w, M_{o c t}) p (w | M_{o c t})], = {arg min}_{w} [- log p (I_{R} | w, M_{o c t}) p (w | M_{o c t})], = {arg min}_{w} [{(I_{V (w)} - I_{R})}^{⊤} Ω_{I} (I_{V (w)} - I_{R})], = {arg min}_{w} [\sum_{(u, v) \in I} ‖ I_{V (w)} (u, v) - I_{R} (u, v) ‖^{2}],$ where $w = {[x_{c} y_{c} z_{c} ψ_{c} θ_{c} ϕ_{c}]}^{⊤}$ denotes the 6DOF camera pose and $I_{V (w)}$ represents the virtual image generated from the arbitrary camera pose $w$ . $I_{R}$ is the real image from the camera sensor

Parameterization of 3D geometric lines based on 3D texture OctoMap

We can take direct parameterization of 3D geometric line segments into consideration from the 3D map information as shown in Fig. 5 (a) because the 3D map may appear clear and valid for extracting the major line segments by estimating the intersections of the planes. However, the map structure consists not of planes, but instead of many voxels (i.e., Octree structure as mentioned in Section 1). Thus, additional processing is required to estimate geometric plane information from the voxels in

Concept of QLH descriptor

In order to estimate camera parameters based on 3D texture map information, a novel feature comparison model is required for a fast and accurate matching process. In other words, as shown in Fig. 13, an image descriptor (i.e., the compressed-dimensional signature) should be defined to compare image data from a real camera with virtual image data from arbitrary camera poses in the map information.

To this end, the concept of histograms is widely exploited in image matching processes

Particle filter-based parameter calibration

A particle filter is used as the main paradigm for the calibration task in this study. It is one of the popular methods for implementing a Bayesian filter that can estimate the probability distribution using a set of random particles. The state (i.e., 6DOF camera pose in this case) is represented by the weighted sum of all particles. The particle filter has the ability to solve not only a local estimation, but also a global estimation problem with high accuracy because it can represent

Simulation

A simulation was conducted with a virtual camera network of up to three cameras. Fig. 19 (a) and (b) show the 3D texture OctoMap information of the simulation environment. The OctoMap information in this simulation has 623,613 nodes of which the minimum voxel size (i.e., the cube of edge length) is 20 mm. The size of the simulation environment is 5 m $\times$ 8 m $\times$ 3 m, including various line features located on the sides of the walls, doors, and windows. Here, the real poses of the virtual cameras

Conclusion

It has been impossible to achieve a global estimation of complete 6DOF camera parameters with no strong constraints because of myriad local minimum solutions. To overcome this difficulty, a novel approach for an automatic and complete parameter calibration system that uses 3D texture map information for camera sensor networks was proposed in this study. The particle filter-based approach that is an implementation of the Bayesian filter was used to estimate the complete camera poses. The

Yonghoon Ji received his B.S. degrees from the Department of Mechanical Engineering and the Department of Computer Engineering, Kyunghee University, Korea, in 2010. He received his M.S. degree from the Department of Mechatronics, Kore University, Korea, in 2012. His Ph.D. degree was from the Department of Precision Engineering, the University of Tokyo, Japan, in 2016. He is an Overseas researcher under Postdoctoral Fellowship of Japan Society for the Promotion of Science (JSPS). His research

References (51)

SatoT. et al.
Robotic room: Symbiosis with human through behavior media
Robot. Auton. Syst.
(1996)
ChenH. et al.
Self-calibration of environmental camera for mobile robot navigation
Robot. Auton. Syst.
(2007)
KiryatiN. et al.
A probabilistic hough transform
Pattern Recognition
(1991)
BayH. et al.
Speeded-up robust features (surf)
Comput. Vis. Image Underst.
(2008)
LeeJ.-H. et al.
Intelligent space—concept and contents
Adv. Robot.
(2002)
FoxlinE.M.
Generalized architecture for simultaneous localization, auto-calibration, and map-building
TaylorC. et al.
Simultaneous localization, calibration, and tracking in an ad-hoc sensor network
FuniakS. et al.
Distributed localization of networked cameras
RahimiA. et al.
Simultaneous calibration and tracking with a network of non-overlapping sensors
JiY. et al.
Automatic calibration and trajectory reconstruction of mobile robot in camera sensor network

HornungA. et al.

Octomap: An efficient probabilistic 3d mapping framework based on octrees

Auton. Robots

(2013)

OctoMap, 2016. [Online]. Available: http://octomap.github.io/. (Accessed12 February...

MurilloA.C. et al.

Surf features for efficient robot localization with omnidirectional images

RamalingamS. et al.

Geolocalization using skylines from omni-images

MariottiniG.L. et al.

Uncalibrated video compass for mobile robots from paracatadioptric line images

IwasaH. et al.

Memory-based self-localization using omnidirectional images

Syst. Comput. Japan

(2003)

IshizukaD. et al.

Self-localization of mobile robot equipped with omnidirectional camera using image matching and 3d–2d edge matching

ElstromM.D. et al.

Stereo-based registration of multi-sensor imagery for enhanced visualization of remote environments

StamosI. et al.

Automatic registration of 2-d with 3-d imagery in urban environments

KurazumeR. et al.

Simultaneous 2d images and 3d geometric model registration for texture mapping utilizing reflectance attribute

LiuL. et al.

Automatic 3d to 2d registration for the photorealistic rendering of urban scenes

IwashitaY. et al.

Fast alignment of 3d geometrical models and 2d grayscale images using 2d distance maps

Syst. Comput. Japan

(2007)

HaraK. et al.

Robust 2d–3d alignment based on geometrical consistency

SmithA. et al.

Sequential Monte Carlo Methods in Practice

(2013)

YeonS. et al.

Robust-PCA-based hierarchical plane extraction for application to geometric 3D indoor mapping

Ind. Robot, Int. J.

(2014)

Cited by (0)

Atsushi Yamashita received his B.E., M.E., and Ph.D. degrees from the Department of Precision Engineering, the University of Tokyo, Japan, in 1996, 1998, and 2001, respectively. From 1998 to 2001, he was a Junior Research Associate in the RIKEN (Institute of Physical and Chemical Research). From 2001 to 2008, he was an Assistant Professor of Shizuoka University. From 2006 to 2007, he was a Visiting Associate of California Institute of Technology. From 2008 to 2011, he was an Associate Professor of Shizuoka University. From 2011, he is an Associate Professor in the Department of Precision Engineering, the University of Tokyo. His research interests include robot vision, image processing, multiple mobile robot system, and motion planning. He is a member of ACM, IEEE, JSPE, RSJ, IEICE, JSME, IEEJ, IPSJ, ITE and SICE.

Hajime Asama received his B.S., M.S., and Dr. Eng degrees from the University of Tokyo, Japan, in 1982, 1984, and 1989, respectively. He was Research Associate, Research Scientist, and Senior Research Scientist in RIKEN (The Institute of Physical and Chemical Research, Japan) from 1986 to 2002. He became a Professor of RACE (Research into Artifacts, Center for Engineering), the University of Tokyo in 2002, and a Professor of School of Engineering, the University of Tokyo since 2009. Currently, he is the chairman of the Task Force for Remote Control Technology of the Council for the Decommissioning of TEPCO’s Fukushima Daiichi NPS, the leader of Project on Disaster Response Robots and their Operation System of Council on Competitiveness-Japan, and the chairman of Robotics Task Force for Anti-Disaster (ROBOTAD). His main research interests are distributed autonomous robotic systems, smart spaces, service engineering, and Mobiligence, and service robotics.

^☆: This work was in part supported by Tough Robotics Challenge, ImPACT Program (2015-PM07-02-01) (Impulsing Paradigm Change through Disruptive Technologies Program).

View full text

Automatic calibration of camera sensor networks based on 3D texture map information☆

Highlights

Abstract

Introduction

Section snippets

Overview of proposed calibration process

Parameterization of 3D geometric lines based on 3D texture OctoMap

Concept of QLH descriptor

Particle filter-based parameter calibration

Simulation

Conclusion

Robot. Auton. Syst.

Robot. Auton. Syst.

Pattern Recognition

Comput. Vis. Image Underst.

Intelligent space—concept and contents

Adv. Robot.

Generalized architecture for simultaneous localization, auto-calibration, and map-building

Simultaneous localization, calibration, and tracking in an ad-hoc sensor network

Distributed localization of networked cameras

Simultaneous calibration and tracking with a network of non-overlapping sensors

Automatic calibration and trajectory reconstruction of mobile robot in camera sensor network

Octomap: An efficient probabilistic 3d mapping framework based on octrees

Auton. Robots

Surf features for efficient robot localization with omnidirectional images

Geolocalization using skylines from omni-images

Uncalibrated video compass for mobile robots from paracatadioptric line images

Memory-based self-localization using omnidirectional images

Syst. Comput. Japan

Self-localization of mobile robot equipped with omnidirectional camera using image matching and 3d–2d edge matching

Stereo-based registration of multi-sensor imagery for enhanced visualization of remote environments

Automatic registration of 2-d with 3-d imagery in urban environments

Simultaneous 2d images and 3d geometric model registration for texture mapping utilizing reflectance attribute

Automatic 3d to 2d registration for the photorealistic rendering of urban scenes

Fast alignment of 3d geometrical models and 2d grayscale images using 2d distance maps

Syst. Comput. Japan

Robust 2d–3d alignment based on geometrical consistency

Sequential Monte Carlo Methods in Practice

Robust-PCA-based hierarchical plane extraction for application to geometric 3D indoor mapping

Ind. Robot, Int. J.