3D target recognition using cooperative feature map binding under Markov Chain Monte Carlo

https://doi.org/10.1016/j.patrec.2005.11.008Get rights and content

Abstract

A robust and effective feature map integration method is presented for infrared (IR) target recognition. Noise in an IR image makes a target recognition system unstable in pose estimation and shape matching. A cooperative feature map binding under computational Gestalt theory shows robust shape matching properties in noisy conditions. The pose of a 3D target is estimated using a Markov Chain Monte Carlo (MCMC) method, a statistical global optimization tool where noise-robust shape matching is used. In addition, bottom-up information accelerates the recognition of 3D targets by providing initial values to the MCMC scheme. Experimental results show that cooperative feature map binding by analyzing spatial relationships has a crucial role in robust shape matching, which is statistically optimized using the MCMC framework.

Introduction

The performance of an IR target recognition system for unmanned aerial vehicles largely depends on image quality, target representation, and the matching paradigm. The issue of target representation is how to cope with the geometrical variations caused by the 3D target pose. There are two approaches to this problem, namely, view-based representation and model-based representation. The view-based approach stores all possible target views (Murase and Nayar, 1995). In recent work, each target view is represented as a sum of visual parts (Nair and Aggarwal, 2000, Lowe, 2004). These representations are biologically plausible and suitable for target indexing, but do not provide accurate target information, such as the 3D pose. The model-based approach represents a 3D target as a 3D computer aided design (CAD) model or voxels, and handles the target pose by controlling the pose parameters of the 3D model (Jain and Dorai, 2000). This representation is suitable for obtaining accurate pose information for artificial IR targets.

The main issue in target matching is how to obtain a correct match between a rendered 3D CAD model and a 2D image in a model-based representation under a noisy environment. There are two kinds of noise, thermal noise in the sensor itself, and atmospheric factors such as humidity and temperature, which affect atmospheric transmittance. The matching should be robust to these noise sources. Fig. 1 shows two kinds of IR images acquired under different humidity and temperature conditions (day and night) at the same site. Note the enormous visual differences in appearance.

There are many descriptor-based matching methods, such as shape context, curvature scale space, and moments (Zhang and Lu, 2004). But these methods assume that the target objects are segmented, which is impractical in a real working environment. One successful target recognition method represents the target as a 3D CAD model and recognizes it by matching either edge magnitudes (Der and Chellappa, 1997) or edge orientations (Olson and Huttenlocher, 1997). However, these approaches are not only unstable under noise, because of single-feature map-based matching, but are also very inefficient, as they must search the full pose space, the scale, and the image region. There is a probabilistic method that handles incomplete data corrupted by noise (Hornegger and Niemann, 2000). This method may be an optimal solution, but is very complex to use. There is also a search space reduction approach using multiple hypotheses from angle cues (Shimshoni and Ponce, 2000). This approach is not so effective, due to using a simple bottom-up cue.

In this paper, we use a 3D CAD model-based representation suitable for artificial targets such as cars and buildings. Fig. 2 summarizes the issues and the proposed methods for dealing with them. A novel shape-matching method is proposed, motivated by feature map binding (Treisman, 1998) and computational Gestalt theory (Desolneux et al., 2004), which are human visual perception properties. This matching shows robust properties to noise. The target pose is optimized using Markov Chain Monte Carlo, called MCMC (Dick et al., 2002), a global optimization tool that is known to outperform the genetic algorithm (Doucet et al., 2001). The pose search problem is alleviated using bottom-up indexing cues to the MCMC.

The structure of this paper is as follows. In Section 1, we describe our 2D shape matching method, which is the core component for 3D target recognition. In Section 3, we show how to extend the 2D shape matching to a 3D target recognition system using MCMC, where the initial parameters are estimated from bottom-up inference. We demonstrate the power of our shape matching in various noisy images, and efficient 3D target recognition results using a single image in Section 4. We conclude in Section 5.

Section snippets

Noise-robust 2D shape matching

It is very important, but difficult, to robustly match a 2D shape model (or rendered 3D CAD model) to IR images, since IR images are sensitive to thermal noise, humidity, and temperature as shown in Fig. 1. (How can you match a 2D roof model to the boxed regions, which show completely different contrast and intensity distribution in a cluttered background?) In this section, we propose a noise-robust shape-matching scheme by incorporating both computational Gestalt theory (Desolneux et al., 2004

Sensor-driven MCMC-based 3D target recognition

This section extends the FIT-based ε-meaningful shape matching to recognizing 3D targets. As we discussed in Section 1, it is important to find a method to robustly estimate 3D target pose under noise. Since we use a model-based 3D target representation, we have to find optimal pose parameters. If we know an initial target pose or matching points, then a linear solution such as nonstochastic pose optimization may be suitable (Drummond and Cipolla, 2002). However, if we do not know the target ID

3D object recognition test using a CCD sensor

First, we tested the algorithm for the objects captured using a CCD camera. We made a database for quantized views as explained above.

Fig. 11(a)–(c) show the optimization process. After 40 iterations, optimal object parameters are estimated by the top-down process. Fig. 11(d) shows another top-down optimization result for a milk pack. Note that a very accurate alignment is possible using only a single camera, bottom-up information, and a 3D shape model, using the MCMC statistical method. The

Conclusion

We propose a novel ATR paradigm based on the human visual system, especially cooperative feature map binding, by utilizing both bottom-up and top-down processes and demonstrate the system performance via several experiments. The test results on several IR images demonstrate efficient optimal matching and robustness to noise, as well as the feasibility of the proposed recognition paradigm.

Acknowledgements

This research was supported by the Korean Ministry of Science and Technology for National Research Laboratory Program (Grant number M1-0302-00-0064), Korea.

References (18)

  • D. Nair et al.

    Bayesian recognition of targets by parts in second generation forward looking infrared images

    Image Vision Comput.

    (2000)
  • D. Zhang et al.

    Review of shape representation and description techniques

    Pattern Recognition

    (2004)
  • S.Z. Der et al.

    Probe-based automatic target recognition in infrared imagery

    IEEE Trans. Image Process.

    (1997)
  • A. Desolneux et al.

    Gestalt theory and computer vision

  • Dick, A.R., Torr, P.H.S., Cipolla, R., 2002. A Bayesian estimation of building shape using MCMC. In: Proceedings of the...
  • A. Doucet et al.

    Sequential Monte Carlo Methods in Practice

    (2001)
  • T. Drummond et al.

    Real-time tracking of complex structures

    IEEE Trans. Pattern Anal. Machine Intell.

    (2002)
  • Fort Carson RSTA Data Collection, Colorado State University Computer Vision Group. Available from:...
  • P. Green

    Reversible Jump Markov Chain Monte Carlo Computation and Bayesian Model Determination

    (1996)
There are more references available in the full text version of this article.

Cited by (0)

View full text