Bayesian recognition of targets by parts in second generation forward looking infrared images

https://doi.org/10.1016/S0262-8856(99)00084-0Get rights and content

Abstract

This paper presents a system for the recognition of targets in second generation forward looking infrared images (FLIR). The recognition of targets is based on a methodology for recognition of two-dimensional objects using object parts. The methodology is based on a hierarchical, modular structure for object recognition. In the most general form, the lowest level consists of classifiers that are trained to recognize the class of the input object, while at the next level, classifiers are trained to recognize specific objects. At each level, the objects are recognized by their parts, and thus each classifier is made up of modules, each of which is an expert on a specific part of the object. Each modular expert is trained to recognize one part under different viewing angles and transformations. A Bayesian realization of the proposed methodology is presented in this paper, in which the expert modules represent the probability density functions of each part, modeled as a mixture of densities to incorporate different views (aspects) of each part. Recognition relies on the sequential presentation of the parts to the system, without using any relational information between the parts. A new method to decompose a target into its parts and results obtained for target recognition in second generation FLIR images are also presented here.

Introduction

Recognizing three-dimensional (3D) objects from two-dimensional (2D) images is an important part of computer vision [1]. The success of most computer vision applications (robotics, automatic target recognition (ATR), surveillance, etc.) is closely tied to reliable recognition of 3D objects or surfaces. The study of object recognition and the development of experimental object recognition systems have had considerable impact on the direction and content of computer vision. Although a plethora of paradigms, algorithms and systems has been proposed over the past two decades, no versatile solution to this problem has been developed; only partial solutions and limited success in constrained environments are the state of the art. In fact, some researchers believe that it is not possible to design an object recognition system that is both functional for a wide variety of scenes and environments and as efficient as a situation-specific system.

The difficulty in obtaining a general and comprehensive solution to this problem can be attributed to the complexity of object recognition in itself, as it involves processing at all levels of machine vision: lower-level vision, as with edge detection and image segmentation; mid-level vision, as with representation and description of pattern shape, and feature extraction; and higher level vision, as with pattern category assignment, matching and reasoning. The success of an object recognition system depends upon succeeding at all these levels. The task is complicated by several factors, such as not knowing how many objects are present in the image, the possibility that the objects may be occluded, the possibility that unknown objects may appear in the image, the motion of the object, and variations in the sensing environment and in the limits and accuracy of the sensor. In applications such as ATR, where targets must be recognized in complex outdoor scenes under adverse conditions, additional factors such as noise, in the form of clutter, and deliberate misinformation, such as camouflage, mislead the recognition system, making the recognition process even more difficult.

In general, recognition is the process of finding a correspondence between certain features in the image and similar features of the object model [2]. The most important issues involved in the process are: (a) identifying the type of features to use, and (b) determining the best procedure to establish the correspondence between image and model features. The reliability and efficiency of an object recognition system directly depends on how carefully these issues are addressed. Generally, recognition follows a bottom–up approach, where features extracted from an image are classified into one of the many object types. However, attempts have also been made to approach the problem using a top–down perspective, where recognition is performed by determining if one of the many known objects appear in the image. The bottom–up approach is adopted in this paper.

View-based object recognition is often referred to as viewer-centered or 2D object recognition, because direct information about the 3D structure of the object (such as a 3D model) is not available; the only a priori information is in the form of representations of the object viewed at different angles (aspects) and distances. Each representation (or characteristic view) describes the object from a single viewpoint, or from a range of viewpoints yielding similar views. There is evidence showing that object recognition in human vision is viewer-centered rather than object-centered [3].

The characteristic views may be obtained by building a database of images of the object or may be rendered from a 3D model of the object [4], [5]. Matching, in this case, is simpler than in model-based recognition because it involves only a 2D/2D comparison. However, the space requirements for representing all of the characteristic views of an object tends to be considerable. Also, the number of model features to search among increases, because each characteristic view can be considered to be a model. Methods to reduce the search space have been addressed by grouping similar views [6], [7], [8].

Broadly speaking, there are two ways to approach this problem. The first is based on matching salient information (e.g. corner points, lines, contours etc.) that has been extracted from the image to the information obtained from the image database [1], [9]. Based on the best match, the object is recognized and its pose estimated. The second approach extracts translation, rotation and scale invariant features (such as moment invariants [10], Zernike moments [11] or Fourier descriptors [12]) from each image and compares them to the features that have been extracted from example images of all the objects. The comparison is usually done in the form of a classification operation [13]. Zernike moments are used for the recognition experiments presented in this paper.

In order to build a system that can succeed in a realistic environment, certain simplifications and assumptions about the environment and the problem are generally made. These simplifications introduce uncertainties into a problem that may create inaccuracies if they are not represented and handled in a suitable manner.

Bayesian statistics have been used at various stages of the object recognition process to provide a firm theoretical footing as well as to improve performance and incorporate error estimates into the recognition problem. The biggest advantage of a Bayesian (or probabilistic) framework is in its ability to incorporate uncertainty elegantly into the recognition process. Bayesian approaches also provide error estimates with their decisions, which give another perspective for analyzing systems. Other advantages of using a Bayesian framework are [14]:

  • 1.

    Modeling assumptions such as priors, noise distributions, etc. need to be explicitly defined. However, once the assumptions are made, the rules of probability give us a unique answer for any question that is posed.

  • 2.

    Bayesian inference satisfies the likelihood principle, where decisions are based on data that is seen and not on data that might have occurred but did not.

  • 3.

    Bayesian model comparison techniques automatically prefer simpler models over complex ones (the Occam's razor principle).

On the other hand, since Bayesian decisions depend heavily on the underlying modeling assumptions, they are very sensitive to the veracity of these assumptions. Usually, a fairly large and representative amount of data is required to get a good description of the model.

Bayesian statistics have been used in object recognition for indexing [15], model matching [16], and incorporating neighborhood relations [17] under different contexts to some degree of success. This paper describes a Bayesian approach to target recognition using target parts.

Section snippets

Previous work: recognition by parts

Object recognition essentially involves the classification of objects into one of many different a priori known object types, and finding the pose of the object once the classification has been done [18], [19]. This paper addresses the problem of object recognition in second-generation 2D FLIR images. (Second generation refers to the better quality of infrared images acquired using recently developed FLIR sensors.) No information is available about the 3D structure (such as a 3D model [18], [20]

Our approach

The recognition system developed in this paper uses target parts for recognition and proposes a hierarchical recognition strategy that uses salient object parts as cues for classification and recognition. The system exploits the advantages of using target parts for recognition, such as reduced model-image feature search space, robust recognition under occlusion, etc.

The use of modular units as the basic building blocks of complex systems [27] has gained popularity. The driving force behind this

A Bayesian hierarchical modular structure

In this section a Bayesian realization of the HMS recognition system described earlier is presented. In the Bayesian system, the expert modules represent the probability density functions of each part, modeled as a mixture of densities to incorporate different views (aspects) of each part. The Expectation–Maximization (EM) approach is used to compute the parameters of these modules. Results obtained for object recognition in second generation FLIR images are also presented in this section.

Part decomposition of target

Underlying the recognition paradigm presented in this paper is an organization of targets and their parts into classes and hierarchies. This section describes the algorithm developed to identify the parts of an object from its 2D image.

Experimental results

This section presents results of using the Bayesian HMS methodology for target recognition in second generation FLIR images. Once the parts of the target are identified using the algorithms described earlier, each part is normalized for translation and rotation and then represented using Zernike moments [11] up to order 8 (i.e. 23 elements) for recognition (i.e. Y), while seven standard moments [10] were used to determine the pose of a part (i.e. Z). Refer to Appendix A for a description of

Conclusions

A hierarchical, modular methodology for object recognition by parts has been presented. Recognition is performed at different levels in the hierarchy; the type of recognition varies from level to level. Each level is made up of expert modules that have been trained to recognize one part of an object.

The effectiveness of the proposed methodology has been demonstrated through a Bayesian realization of the system for the recognition of 2D targets. In this system, each part expert module represents

Acknowledgements

This work was supported in part by the Army Research Office, Contracts DAAH 04-94-G-0417/DAAD-19-99-1-0012 and DAAH 04-95-I-0494.

References (46)

  • U. Ramer

    An iterative procedure for the polygonal approximation of plane curves

    Computer Vision, Graphics and Image Processing

    (1972)
  • M. Leyton

    Symmetry-curvature duality

    Computer Vision, Graphics and Image Processing

    (1987)
  • J.M.J. Murre et al.

    CALM: categorizing and learning module

    Neural Networks

    (1992)
  • J.W. McKee et al.

    Computer recognition of partial views of curved objects

    IEEE Transactions on Computers

    (1977)
  • W.E.L. Grimson

    Object Recognition by Computer: The Role of Geometric Constraints

    (1990)
  • J. Koenderink et al.

    The internal representation of solid shape with respect to vision

    Biological Cybernetics

    (1979)
  • A. Pathak, O.I. Camps, Bayesian view class determination, in: IEEE Conference on Computer Vision and Pattern...
  • S. Zhang et al.

    The automatic construction of a view-independent relational model for 3D object recognition

    IEEE Transaction on Pattern Analysis and Machine Intelligence

    (1993)
  • J.B. Burns, E.M. Riseman, Matching complex images to multiple 3D objects using view description networks, in: IEEE...
  • S. Petijean et al.

    Computing exact aspect graphs of curved objects: algebraic surfaces

    International Journal on Computer Vision

    (1992)
  • A. Pope, Model-based object recognition-a survey of recent research, Technical Report TR-94-04,...
  • M. Hu

    Visual pattern recognition by moment invariants

    IRE Transactions on Information Theory

    (1962)
  • A. Khotanzad et al.

    Invariant image recognition by Zernike moments

    IEEE Transactions on Pattern Analysis and Machine Intelligence

    (1990)
  • Cited by (21)

    • Aircraft target detection using multimodal satellite-based data

      2019, Signal Processing
      Citation Excerpt :

      In these works, how to detect required targets effectively and efficiently has become an important task and attracted much attention in recent years. Visual target detection [1–5] has wide applications in face detection, font recognition, and vehicle detection [6–8]. Multitask shared sparse regression, multi-Modal features fusion, multi-task learning and predicting personalized image emotion perceptions methods was wildly used [9–13].

    • 3D target recognition using cooperative feature map binding under Markov Chain Monte Carlo

      2006, Pattern Recognition Letters
      Citation Excerpt :

      The view-based approach stores all possible target views (Murase and Nayar, 1995). In recent work, each target view is represented as a sum of visual parts (Nair and Aggarwal, 2000; Lowe, 2004). These representations are biologically plausible and suitable for target indexing, but do not provide accurate target information, such as the 3D pose.

    • Aircraft target onboard detecting technology via Circular Information Matching method for remote sensing satellite

      2015, Proceedings of SPIE - The International Society for Optical Engineering
    View all citing articles on Scopus
    View full text