3D target recognition using cooperative feature map binding under Markov Chain Monte Carlo
Introduction
The performance of an IR target recognition system for unmanned aerial vehicles largely depends on image quality, target representation, and the matching paradigm. The issue of target representation is how to cope with the geometrical variations caused by the 3D target pose. There are two approaches to this problem, namely, view-based representation and model-based representation. The view-based approach stores all possible target views (Murase and Nayar, 1995). In recent work, each target view is represented as a sum of visual parts (Nair and Aggarwal, 2000, Lowe, 2004). These representations are biologically plausible and suitable for target indexing, but do not provide accurate target information, such as the 3D pose. The model-based approach represents a 3D target as a 3D computer aided design (CAD) model or voxels, and handles the target pose by controlling the pose parameters of the 3D model (Jain and Dorai, 2000). This representation is suitable for obtaining accurate pose information for artificial IR targets.
The main issue in target matching is how to obtain a correct match between a rendered 3D CAD model and a 2D image in a model-based representation under a noisy environment. There are two kinds of noise, thermal noise in the sensor itself, and atmospheric factors such as humidity and temperature, which affect atmospheric transmittance. The matching should be robust to these noise sources. Fig. 1 shows two kinds of IR images acquired under different humidity and temperature conditions (day and night) at the same site. Note the enormous visual differences in appearance.
There are many descriptor-based matching methods, such as shape context, curvature scale space, and moments (Zhang and Lu, 2004). But these methods assume that the target objects are segmented, which is impractical in a real working environment. One successful target recognition method represents the target as a 3D CAD model and recognizes it by matching either edge magnitudes (Der and Chellappa, 1997) or edge orientations (Olson and Huttenlocher, 1997). However, these approaches are not only unstable under noise, because of single-feature map-based matching, but are also very inefficient, as they must search the full pose space, the scale, and the image region. There is a probabilistic method that handles incomplete data corrupted by noise (Hornegger and Niemann, 2000). This method may be an optimal solution, but is very complex to use. There is also a search space reduction approach using multiple hypotheses from angle cues (Shimshoni and Ponce, 2000). This approach is not so effective, due to using a simple bottom-up cue.
In this paper, we use a 3D CAD model-based representation suitable for artificial targets such as cars and buildings. Fig. 2 summarizes the issues and the proposed methods for dealing with them. A novel shape-matching method is proposed, motivated by feature map binding (Treisman, 1998) and computational Gestalt theory (Desolneux et al., 2004), which are human visual perception properties. This matching shows robust properties to noise. The target pose is optimized using Markov Chain Monte Carlo, called MCMC (Dick et al., 2002), a global optimization tool that is known to outperform the genetic algorithm (Doucet et al., 2001). The pose search problem is alleviated using bottom-up indexing cues to the MCMC.
The structure of this paper is as follows. In Section 1, we describe our 2D shape matching method, which is the core component for 3D target recognition. In Section 3, we show how to extend the 2D shape matching to a 3D target recognition system using MCMC, where the initial parameters are estimated from bottom-up inference. We demonstrate the power of our shape matching in various noisy images, and efficient 3D target recognition results using a single image in Section 4. We conclude in Section 5.
Section snippets
Noise-robust 2D shape matching
It is very important, but difficult, to robustly match a 2D shape model (or rendered 3D CAD model) to IR images, since IR images are sensitive to thermal noise, humidity, and temperature as shown in Fig. 1. (How can you match a 2D roof model to the boxed regions, which show completely different contrast and intensity distribution in a cluttered background?) In this section, we propose a noise-robust shape-matching scheme by incorporating both computational Gestalt theory (Desolneux et al., 2004
Sensor-driven MCMC-based 3D target recognition
This section extends the FIT-based ε-meaningful shape matching to recognizing 3D targets. As we discussed in Section 1, it is important to find a method to robustly estimate 3D target pose under noise. Since we use a model-based 3D target representation, we have to find optimal pose parameters. If we know an initial target pose or matching points, then a linear solution such as nonstochastic pose optimization may be suitable (Drummond and Cipolla, 2002). However, if we do not know the target ID
3D object recognition test using a CCD sensor
First, we tested the algorithm for the objects captured using a CCD camera. We made a database for quantized views as explained above.
Fig. 11(a)–(c) show the optimization process. After 40 iterations, optimal object parameters are estimated by the top-down process. Fig. 11(d) shows another top-down optimization result for a milk pack. Note that a very accurate alignment is possible using only a single camera, bottom-up information, and a 3D shape model, using the MCMC statistical method. The
Conclusion
We propose a novel ATR paradigm based on the human visual system, especially cooperative feature map binding, by utilizing both bottom-up and top-down processes and demonstrate the system performance via several experiments. The test results on several IR images demonstrate efficient optimal matching and robustness to noise, as well as the feasibility of the proposed recognition paradigm.
Acknowledgements
This research was supported by the Korean Ministry of Science and Technology for National Research Laboratory Program (Grant number M1-0302-00-0064), Korea.
References (18)
- et al.
Bayesian recognition of targets by parts in second generation forward looking infrared images
Image Vision Comput.
(2000) - et al.
Review of shape representation and description techniques
Pattern Recognition
(2004) - et al.
Probe-based automatic target recognition in infrared imagery
IEEE Trans. Image Process.
(1997) - et al.
Gestalt theory and computer vision
- Dick, A.R., Torr, P.H.S., Cipolla, R., 2002. A Bayesian estimation of building shape using MCMC. In: Proceedings of the...
- et al.
Sequential Monte Carlo Methods in Practice
(2001) - et al.
Real-time tracking of complex structures
IEEE Trans. Pattern Anal. Machine Intell.
(2002) - Fort Carson RSTA Data Collection, Colorado State University Computer Vision Group. Available from:...
Reversible Jump Markov Chain Monte Carlo Computation and Bayesian Model Determination
(1996)