Bayesian recognition of targets by parts in second generation forward looking infrared images

doi:10.1016/S0262-8856(99)00084-0

Image and Vision Computing

Volume 18, Issue 10, July 2000, Pages 849-864

https://doi.org/10.1016/S0262-8856(99)00084-0 Get rights and content

Abstract

This paper presents a system for the recognition of targets in second generation forward looking infrared images (FLIR). The recognition of targets is based on a methodology for recognition of two-dimensional objects using object parts. The methodology is based on a hierarchical, modular structure for object recognition. In the most general form, the lowest level consists of classifiers that are trained to recognize the class of the input object, while at the next level, classifiers are trained to recognize specific objects. At each level, the objects are recognized by their parts, and thus each classifier is made up of modules, each of which is an expert on a specific part of the object. Each modular expert is trained to recognize one part under different viewing angles and transformations. A Bayesian realization of the proposed methodology is presented in this paper, in which the expert modules represent the probability density functions of each part, modeled as a mixture of densities to incorporate different views (aspects) of each part. Recognition relies on the sequential presentation of the parts to the system, without using any relational information between the parts. A new method to decompose a target into its parts and results obtained for target recognition in second generation FLIR images are also presented here.

Introduction

Recognizing three-dimensional (3D) objects from two-dimensional (2D) images is an important part of computer vision [1]. The success of most computer vision applications (robotics, automatic target recognition (ATR), surveillance, etc.) is closely tied to reliable recognition of 3D objects or surfaces. The study of object recognition and the development of experimental object recognition systems have had considerable impact on the direction and content of computer vision. Although a plethora of paradigms, algorithms and systems has been proposed over the past two decades, no versatile solution to this problem has been developed; only partial solutions and limited success in constrained environments are the state of the art. In fact, some researchers believe that it is not possible to design an object recognition system that is both functional for a wide variety of scenes and environments and as efficient as a situation-specific system.

The difficulty in obtaining a general and comprehensive solution to this problem can be attributed to the complexity of object recognition in itself, as it involves processing at all levels of machine vision: lower-level vision, as with edge detection and image segmentation; mid-level vision, as with representation and description of pattern shape, and feature extraction; and higher level vision, as with pattern category assignment, matching and reasoning. The success of an object recognition system depends upon succeeding at all these levels. The task is complicated by several factors, such as not knowing how many objects are present in the image, the possibility that the objects may be occluded, the possibility that unknown objects may appear in the image, the motion of the object, and variations in the sensing environment and in the limits and accuracy of the sensor. In applications such as ATR, where targets must be recognized in complex outdoor scenes under adverse conditions, additional factors such as noise, in the form of clutter, and deliberate misinformation, such as camouflage, mislead the recognition system, making the recognition process even more difficult.

In general, recognition is the process of finding a correspondence between certain features in the image and similar features of the object model [2]. The most important issues involved in the process are: (a) identifying the type of features to use, and (b) determining the best procedure to establish the correspondence between image and model features. The reliability and efficiency of an object recognition system directly depends on how carefully these issues are addressed. Generally, recognition follows a bottom–up approach, where features extracted from an image are classified into one of the many object types. However, attempts have also been made to approach the problem using a top–down perspective, where recognition is performed by determining if one of the many known objects appear in the image. The bottom–up approach is adopted in this paper.

View-based object recognition is often referred to as viewer-centered or 2D object recognition, because direct information about the 3D structure of the object (such as a 3D model) is not available; the only a priori information is in the form of representations of the object viewed at different angles (aspects) and distances. Each representation (or characteristic view) describes the object from a single viewpoint, or from a range of viewpoints yielding similar views. There is evidence showing that object recognition in human vision is viewer-centered rather than object-centered [3].

The characteristic views may be obtained by building a database of images of the object or may be rendered from a 3D model of the object [4], [5]. Matching, in this case, is simpler than in model-based recognition because it involves only a 2D/2D comparison. However, the space requirements for representing all of the characteristic views of an object tends to be considerable. Also, the number of model features to search among increases, because each characteristic view can be considered to be a model. Methods to reduce the search space have been addressed by grouping similar views [6], [7], [8].

Broadly speaking, there are two ways to approach this problem. The first is based on matching salient information (e.g. corner points, lines, contours etc.) that has been extracted from the image to the information obtained from the image database [1], [9]. Based on the best match, the object is recognized and its pose estimated. The second approach extracts translation, rotation and scale invariant features (such as moment invariants [10], Zernike moments [11] or Fourier descriptors [12]) from each image and compares them to the features that have been extracted from example images of all the objects. The comparison is usually done in the form of a classification operation [13]. Zernike moments are used for the recognition experiments presented in this paper.

In order to build a system that can succeed in a realistic environment, certain simplifications and assumptions about the environment and the problem are generally made. These simplifications introduce uncertainties into a problem that may create inaccuracies if they are not represented and handled in a suitable manner.

Bayesian statistics have been used at various stages of the object recognition process to provide a firm theoretical footing as well as to improve performance and incorporate error estimates into the recognition problem. The biggest advantage of a Bayesian (or probabilistic) framework is in its ability to incorporate uncertainty elegantly into the recognition process. Bayesian approaches also provide error estimates with their decisions, which give another perspective for analyzing systems. Other advantages of using a Bayesian framework are [14]:

1.
Modeling assumptions such as priors, noise distributions, etc. need to be explicitly defined. However, once the assumptions are made, the rules of probability give us a unique answer for any question that is posed.
2.
Bayesian inference satisfies the likelihood principle, where decisions are based on data that is seen and not on data that might have occurred but did not.
3.
Bayesian model comparison techniques automatically prefer simpler models over complex ones (the Occam's razor principle).

On the other hand, since Bayesian decisions depend heavily on the underlying modeling assumptions, they are very sensitive to the veracity of these assumptions. Usually, a fairly large and representative amount of data is required to get a good description of the model.

Bayesian statistics have been used in object recognition for indexing [15], model matching [16], and incorporating neighborhood relations [17] under different contexts to some degree of success. This paper describes a Bayesian approach to target recognition using target parts.

Section snippets

Previous work: recognition by parts

Object recognition essentially involves the classification of objects into one of many different a priori known object types, and finding the pose of the object once the classification has been done [18], [19]. This paper addresses the problem of object recognition in second-generation 2D FLIR images. (Second generation refers to the better quality of infrared images acquired using recently developed FLIR sensors.) No information is available about the 3D structure (such as a 3D model [18], [20]

Our approach

The recognition system developed in this paper uses target parts for recognition and proposes a hierarchical recognition strategy that uses salient object parts as cues for classification and recognition. The system exploits the advantages of using target parts for recognition, such as reduced model-image feature search space, robust recognition under occlusion, etc.

The use of modular units as the basic building blocks of complex systems [27] has gained popularity. The driving force behind this

A Bayesian hierarchical modular structure

In this section a Bayesian realization of the HMS recognition system described earlier is presented. In the Bayesian system, the expert modules represent the probability density functions of each part, modeled as a mixture of densities to incorporate different views (aspects) of each part. The Expectation–Maximization (EM) approach is used to compute the parameters of these modules. Results obtained for object recognition in second generation FLIR images are also presented in this section.

Part decomposition of target

Underlying the recognition paradigm presented in this paper is an organization of targets and their parts into classes and hierarchies. This section describes the algorithm developed to identify the parts of an object from its 2D image.

Experimental results

This section presents results of using the Bayesian HMS methodology for target recognition in second generation FLIR images. Once the parts of the target are identified using the algorithms described earlier, each part is normalized for translation and rotation and then represented using Zernike moments [11] up to order 8 (i.e. 23 elements) for recognition (i.e. Y), while seven standard moments [10] were used to determine the pose of a part (i.e. Z). Refer to Appendix A for a description of

Conclusions

A hierarchical, modular methodology for object recognition by parts has been presented. Recognition is performed at different levels in the hierarchy; the type of recognition varies from level to level. Each level is made up of expert modules that have been trained to recognize one part of an object.

The effectiveness of the proposed methodology has been demonstrated through a Bayesian realization of the system for the recognition of 2D targets. In this system, each part expert module represents

Acknowledgements

This work was supported in part by the Army Research Office, Contracts DAAH 04-94-G-0417/DAAD-19-99-1-0012 and DAAH 04-95-I-0494.

References (46)

S. Chen et al.
Strategies of multi-view multi-matching for 3D object recognition
Computer Vision and Image Processing
(1993)
Z. Chen et al.
Computer vision for robust 3D aircraft recognition with fast library search
Pattern Recognition
(1991)
E. Rivlin et al.
Recognition by functional parts
Computer Vision, Graphics and Image Processing
(1995)
I. Biederman
Human image understanding: recent research and a theory
Computer Vision, Graphics and Image Processing
(1985)
S.J. Dickinson et al.
From volumes to views: An approach to 3-D object recognition
Computer Vision, Graphics and Image Processing: Image Understanding
(1992)
B. Happel et al.
Design and evolution of modular neural network architectures
Neural Networks
(1994)
J. Szentagothai
The ‘module-concept’ in the cerebral cortex architecture
Brain Research
(1975)
H. Blum
Biological shape and visual science
Journal of Theoretical Biology
(1973)
R. Nevatia et al.
Description and recognition of curved objects
Artificial Intelligence
(1977)
H. Freeman
Shape description via the use of critical points
Pattern Recognition
(1978)

U. Ramer

An iterative procedure for the polygonal approximation of plane curves

Computer Vision, Graphics and Image Processing

(1972)

M. Leyton

Symmetry-curvature duality

Computer Vision, Graphics and Image Processing

(1987)

J.M.J. Murre et al.

CALM: categorizing and learning module

Neural Networks

(1992)

J.W. McKee et al.

Computer recognition of partial views of curved objects

IEEE Transactions on Computers

(1977)

W.E.L. Grimson

Object Recognition by Computer: The Role of Geometric Constraints

(1990)

J. Koenderink et al.

The internal representation of solid shape with respect to vision

Biological Cybernetics

(1979)

A. Pathak, O.I. Camps, Bayesian view class determination, in: IEEE Conference on Computer Vision and Pattern...

S. Zhang et al.

The automatic construction of a view-independent relational model for 3D object recognition

IEEE Transaction on Pattern Analysis and Machine Intelligence

(1993)

J.B. Burns, E.M. Riseman, Matching complex images to multiple 3D objects using view description networks, in: IEEE...

S. Petijean et al.

Computing exact aspect graphs of curved objects: algebraic surfaces

International Journal on Computer Vision

(1992)

A. Pope, Model-based object recognition-a survey of recent research, Technical Report TR-94-04,...

M. Hu

Visual pattern recognition by moment invariants

IRE Transactions on Information Theory

(1962)

A. Khotanzad et al.

Invariant image recognition by Zernike moments

IEEE Transactions on Pattern Analysis and Machine Intelligence

(1990)

Cited by (21)

Aircraft target detection using multimodal satellite-based data
2019, Signal Processing
Citation Excerpt :
In these works, how to detect required targets effectively and efficiently has become an important task and attracted much attention in recent years. Visual target detection [1–5] has wide applications in face detection, font recognition, and vehicle detection [6–8]. Multitask shared sparse regression, multi-Modal features fusion, multi-task learning and predicting personalized image emotion perceptions methods was wildly used [9–13].
In recent years, remote sensing image processing on orbit has become an essential task to conduct rapid satellite response, where the target detection using multimodal remote sensing images has attracted much research attention. More specifically, the aircraft is an important type of targets and aircraft target detection on-orbit becomes much essential due to the safety reasons. In this work, we main focus on the detection of aircraft targets using multimodal satellite-based data. Our method consists of three main procedures. First, we estimate the position of the imaging target in the combination of satellite attitude and orbit information, and then we can select several candidate image patch from the original large scale image data. Then, the initial investigation of candidate target areas is detected based on the texture characteristics, where Gabor filtering, image binarization and regional Unicom are utilized and the candidate aircraft target can be roughly detected. Since the characteristics of aircrafts remain relatively stable, the candidate areas can be further refined based on circular-matched filtering. To evaluate the performance of the proposed method on aircraft target detection, we have conducted experiments on six large scale remote sensing images. Experiments show that our proposed method is able to achieve an accuracy of 90% for target detection and the detection time is less than 0.5 s, which can be used for real time applications.
On statistical approaches to target silhouette classification in difficult conditions
2008, Digital Signal Processing: A Review Journal
In this paper we present a methodical evaluation of the performance of a new and two traditional approaches to automatic target recognition (ATR) based on silhouette representation of objects. Performance is evaluated under the simulated conditions of imperfect localization by a region of interest (ROI) algorithm (resulting in clipping and scale changes) as well as occlusions by other silhouettes, noise and out-of-plane rotations. The two traditional approaches are holistic in nature and are based on moment invariants and principal component analysis (PCA), while the proposed approach is based on local features (object parts) and is comprised of a block-by-block 2D Hadamard transform (HT) coupled with a Gaussian mixture model (GMM) classifier. Experiments show that the proposed approach has good robustness to clipping and, to a lesser extent, robustness to scale changes. The moment invariants based approach achieves poor performance in advantageous conditions and is easily affected by clipping and occlusions. The PCA based approach is highly affected by scale changes and clipping, while being relatively robust to occlusions and noise. Furthermore, we show that the performance of a silhouette recognition system subject to mismatches between training and test angles of silhouettes (caused by an out-of-plane rotation) can be considerably improved by extending the training set using only a few angles which are widely spaced apart. The improvement comes without affecting the performance at “side-on” views.
3D target recognition using cooperative feature map binding under Markov Chain Monte Carlo
2006, Pattern Recognition Letters
Citation Excerpt :
The view-based approach stores all possible target views (Murase and Nayar, 1995). In recent work, each target view is represented as a sum of visual parts (Nair and Aggarwal, 2000; Lowe, 2004). These representations are biologically plausible and suitable for target indexing, but do not provide accurate target information, such as the 3D pose.
A robust and effective feature map integration method is presented for infrared (IR) target recognition. Noise in an IR image makes a target recognition system unstable in pose estimation and shape matching. A cooperative feature map binding under computational Gestalt theory shows robust shape matching properties in noisy conditions. The pose of a 3D target is estimated using a Markov Chain Monte Carlo (MCMC) method, a statistical global optimization tool where noise-robust shape matching is used. In addition, bottom-up information accelerates the recognition of 3D targets by providing initial values to the MCMC scheme. Experimental results show that cooperative feature map binding by analyzing spatial relationships has a crucial role in robust shape matching, which is statistically optimized using the MCMC framework.
Velocity field detection method using the elastic wave of a damaged bolster
2019, International Journal of COMADEM
Design and implementation of DRFCN in-depth network for military target identification
2019, Guangdian Gongcheng/Opto-Electronic Engineering
Aircraft target onboard detecting technology via Circular Information Matching method for remote sensing satellite
2015, Proceedings of SPIE - The International Society for Optical Engineering

View all citing articles on Scopus

View full text

Bayesian recognition of targets by parts in second generation forward looking infrared images

Abstract

Introduction

Section snippets

Previous work: recognition by parts

Our approach

A Bayesian hierarchical modular structure

Part decomposition of target

Experimental results

Conclusions

Acknowledgements

Computer Vision and Image Processing

Pattern Recognition

Computer Vision, Graphics and Image Processing

Computer Vision, Graphics and Image Processing

Computer Vision, Graphics and Image Processing: Image Understanding

Neural Networks

Brain Research

Journal of Theoretical Biology

Artificial Intelligence

Pattern Recognition

Computer Vision, Graphics and Image Processing

Computer Vision, Graphics and Image Processing

Neural Networks

Computer recognition of partial views of curved objects

IEEE Transactions on Computers

Object Recognition by Computer: The Role of Geometric Constraints

The internal representation of solid shape with respect to vision

Biological Cybernetics

The automatic construction of a view-independent relational model for 3D object recognition

IEEE Transaction on Pattern Analysis and Machine Intelligence

Computing exact aspect graphs of curved objects: algebraic surfaces

International Journal on Computer Vision

Visual pattern recognition by moment invariants

IRE Transactions on Information Theory

Invariant image recognition by Zernike moments

IEEE Transactions on Pattern Analysis and Machine Intelligence