3D target recognition based on projective invariant relationships

https://doi.org/10.1016/S1047-3203(02)00013-5Get rights and content

Abstract

In this paper, we propose a new 3D target recognition algorithm using a single image. Our approach is based on geometrically invariant relationships. By employing a practical CCD camera model for projective image formation, and analyzing the constraints on the projection parameters, we derive two invariant relationships using six pairs of 3D space features and corresponding image features, such as points and lines. Compared to the conventional approach in which only a single invariant relationship is derived from six point pairs based on the general finite projective camera model, the two relationships obtained from the practical camera model can increase the effectiveness of matching. Based on the derived invariant relationships, a new view-invariant object recognition algorithm using a single image is proposed. The performance of the proposed recognition algorithm is demonstrated by various computer simulations on synthetic and real images of objects. The experimental results show that 3D objects in the image can be robustly recognized, using the linear features, by the proposed algorithm.

Introduction

One of the main goals of computer vision is to recognize and understand objects in a scene. In recognizing objects in a 2D image, usually we are confronted with some difficult problems associated with the projective transform which occur in the 2D imaging process, such as the perspective distortion, viewing direction-variant problem, and so on (Duda and Hart, 1973). In order to avoid these undesirable projective effects, complicated high-level techniques could be employed (Haralick, 1980; Lutton et al., 1994). However, due to the inherent viewpoint invariant property, techniques based on geometric invariants (Mundy and Zisserman, 1992) are known to provide quite effective solutions to the related problems, and thus have received considerable interests in computer vision research in recent years. So far, several types of geometric invariants have been introduced in the literature, including algebraic invariants Forsyth et al., 1991; Barrett et al., 1991; Carlsson, 1996) and differential invariants (Weiss, 1993a, Weiss, 1993b). However, since most of them are invariants associated with coplanar features which can only be applied to the simple planar object recognition tasks exclusively, these 2D invariants are not sufficient to recognize 3D objects in general configurations (Forsyth et al., 1991; Carlsson, 1996; Weiss, 1993a, Weiss, 1993b).

Recently, promising efforts for analyzing and constructing the invariants for general 3D space features have been made by several researchers. Burns et al. (1993) have shown that general-case view-invariants do not exist for any number of points, asserting that no invariant can be found for 3D points in general configuration from a single image. Obviously, this fact leads one to use additional information on the geometric relations of the points. One approach is to use multiple images of the underlying object. By using three images, Long Quan (1995) has presented the invariants for six space points, provided that the correspondences of the points are known. Another approach is to make some restrictive assumptions on the geometric configuration of the feature points. Rothwell et al. (1993) have derived projective invariants for point features of two classes of objects: polyhedral objects and object possessing bilateral symmetry. Zhu et al. (1996) also have shown the existence of an invariant for six points, which consist of two sets of four coplanar points lying on two adjacent planes. Weiss and Ray (1998) have provided an approach to recognize 3D objects from a single image, based on the relationship among 3D invariants and 2D invariants. To increase the efficiency of recognition, some model assumptions, such as symmetry, are also used.

Another approach for invariant-based recognition is to use invariant relationships. It is known from recent research that although no general-case invariant can be obtained, the invariant relationship between space points and the corresponding projected points in an image plane can be established, which also can be used as a powerful tool for various vision tasks, such as model-based recognition using a single image. Long Quan (1995) has shown that the coordinates of six space points and those of corresponding image points constitute one invariant equation, regardless of the viewing direction. This invariant relationship is derived through the elimination process of the 11 parameters of the general projection matrix, using the correspondences of six pairs of space and image points.

In this paper, we focus on constructing new invariant relationships for 3D target recognition from a single image, by employing a practical CCD camera model, which has 10 degrees of freedom (Hartley and Zisserman, 2000). Compared to the general finite projective model case, which has 11 degrees of freedom (Hartley and Zisserman, 2000), the employed camera model yields additional constraints on the components of the projection matrix, so that we can determine all the projection parameters using only five pairs of 3D space and image points. As a consequence, two new invariant relationships are obtained for the sixth pair of corresponding image and space points, which can be used to verify the 3D model. Moreover, in this paper, we extend this notion to the invariant relationships between 3D space lines and 2D projected ones. Although the mapping relation between 3D space lines and corresponding image lines is not the same as that between points, similar results can be drawn by using the same constraint derived from the projection parameters. Applications of these point and line invariant relationships to model-based 3D object recognition from a single perspective view are also presented in this paper. In the proposed recognition algorithm, the 3D reference model is decomposed into two sets of features: a set of five reference features and another set of verifying features. Once features are extracted from an input image using a reliable preprocessing, such as corners or line detection, matching is then examined in an assume-and-verify manner in which the projection matrix is determined for the arbitrary five image features and verified by the other verifying features.

In this paper, the process of recovering the projection parameters involves a fourth-order equation to be solved, which requires too much computation for fast matching. Thus, we simplify the problem, by imposing a constraint on the model structure that the reference features include four coplanar points and three coplanar lines for point and line models, respectively, so that the unknown parameters are easily eliminated by solving only a second-order equation.

This paper is composed of seven sections. In Section 2, we describe the practical pinhole camera model and the characteristics of the projection matrix. Invariant relations for the points and lines are derived in 3 Invariant relationships among 3D points and image points, 4 Invariant relationships for lines, respectively. The proposed 3D object recognition algorithm, based on invariant relationships, is described in Section 5. Experimental results on several images are presented in Section 6, and finally the conclusions are drawn in Section 7.

Section snippets

Practical pinhole camera model

The projection of a 3D space point onto an image plane through a camera is governed by both internal and external factors. Typical external factors are the translation and rotation of an object which characterize the position and orientation of the object, with respect to the focal origin of a camera. There are two types of internal factors: one is from the camera parameters, such as the focal length, the zoom, and the field of view, and the other comes from the parameters relating to the

Invariant relationships among 3D points and image points

We now derive two invariant relationships between space point and the projected image point, by using the new constraint obtained in the previous section. This can be accomplished as follows: First, the unknown components of the normalized projection matrix T̃ are determined using five correspondences of the space and image points. Then, any space point and its corresponding image point can be related by the two invariant relationships given in , .

Invariant relationships for lines

In this section, we derive invariant relationships for 3D lines and the projected image lines. By employing the constraint on the projection parameters discussed in the previous section, we show that the similar results for point projection can be drawn for the line projection case. That is, five pairs of space and corresponding image lines are sufficient to determine the normalized projection matrix, providing two invariant relationships for any other line pair.

Proposed recognition algorithm

The normalized projection matrix is now completely determined using the correspondences of five pairs of points or lines. Then, the invariant relationships for the points or lines derived in the previous sections can be used effectively for 3D object recognition. Figure 1 shows the block diagram for the proposed recognition algorithm.

Suppose we are given a 3D model with M space features (points or lines), which are decomposed into two sets: five reference features and the remaining (M−5)

Experiments

The proposed recognition algorithm is tested on both synthetic and real object images. First, in order to verify the appropriateness of the practical pinhole camera model we employed, the feasibility test is carried out using a calibration box, which is commonly used for the camera calibration task. In case the calibration box is used, notice that the space coordinates of the white dots printed regularly on its faces are exactly known. In this experiment, five images of different views are

Conclusions

In this paper, by employing a practical CCD camera model for projective image formation and analyzing the constraints among the projection parameters, we have presented new invariant relationships between 3D features of both points and lines and their projections on the image plane. Unlike the conventional approaches employing an ideal projection model, we showed that by using additional constraints that exist among parameters of a practical camera model, five pairs of space and image features

BONG SEOP SONG received the B.S., M.S., and Ph.D. from Seoul National University, Seoul, Korea, 1992, 1994, and 1999, respectively. From 1999 to 2002, he was a senior researcher at Daewoo Electronics Co. Ltd. In 2002, he joined Suprema Inc., in Seoul, where he is currently a technical director. His current research interests are model-based object recognition, applications of invariance in computer vision, and biometrics.

References (15)

There are more references available in the full text version of this article.

Cited by (0)

BONG SEOP SONG received the B.S., M.S., and Ph.D. from Seoul National University, Seoul, Korea, 1992, 1994, and 1999, respectively. From 1999 to 2002, he was a senior researcher at Daewoo Electronics Co. Ltd. In 2002, he joined Suprema Inc., in Seoul, where he is currently a technical director. His current research interests are model-based object recognition, applications of invariance in computer vision, and biometrics.

SANG UK LEE received his B.S. degree from Seoul National University, Seoul, Korea, in 1973, the M.S. degree from Iowa State University, Ames in 1976, and Ph.D. degree from University of Southern California, Los Angeles, in 1980, all in electrical engineering. From 1980 to 1981, he was with the General Electric Company, Lynchburg, VA, working on the development of digital mobile radio. From 1981 to 1983, he was a Member of Technical Staff, M/A-COM Research Center, Rockville, MD. In 1983, he joined the Department of Control and Instrumentation engineering at Seoul National University as an Assistant Professor, where he is now a Professor at the School of Electrical Engineering and Computer Science. Currently, he is also affiliated with the Automation and Systems Research Institute and the Institute of New Media and Communications at Seoul National University. His current research interests are in the areas of image and video signal processing, digital communication, and computer vision. He served as an Editor-in-Chief for the Transaction of the Korean Institute of Communication Science from 1994 to 1996. Dr. Lee is currently a Member of Editorial Board of both the Journal of Visual Communication and Image Representation and the Journal of Applied Signal Processing, and an Associate Editor for IEEE Transaction on Circuits and Systems for Video Technology. He is a member of Phi Kappa Phi.

KYOUNG MU LEE received the B.S. and M.S. degrees in Control and Instrumentation Engineering from Seoul National University, Seoul, Korea in 1984 and 1986, respectively, and Ph.D. degree in Electrical Engineering from the University of Southern California, Los Angeles, in 1993. From 1993 to 1994, he was a research associate in the Signal and Image Processing Institute at the University of Southern California. He was with the Samsung Electronics Co. Ltd. at Suwon in Korea as a senior researcher from 1994 to 1995, where he worked on developing industrial real-time vision systems. In 1995, he joined the Department of Electronics and Electrical Engineering of the Hong-Ik University in Seoul, Korea as an Assistant Professor, where he is currently an Associate Professor. Dr. Lee is currently serving as a Member of Editorial Board of the Journal of Applied Signal Processing. His current primary research interests include computational vision, shape from X, 2D and 3D object recognition, human–computer interface, and visual navigation.

IL DONG YUN received his B.S., M.S., and Ph.D. degrees in electrical engineering from Seoul National University, Seoul, Korea, in 1989, 1991, and 1996, respectively. From 1996 to 1997, he was employed at the Daewoo Electronics Inc., Seoul, Korea. In 1997, he joined the School of Electronics and Instrumentation Engineering at Hankuk University of Foreign Studies, as a faculty member, where he is currently an Assistant Professor. His research interests include object recognition, texture segmentation, and 3D shape reconstruction, segmentation, and description.

View full text