Bayesian recognition of targets by parts in second generation forward looking infrared images
Introduction
Recognizing three-dimensional (3D) objects from two-dimensional (2D) images is an important part of computer vision [1]. The success of most computer vision applications (robotics, automatic target recognition (ATR), surveillance, etc.) is closely tied to reliable recognition of 3D objects or surfaces. The study of object recognition and the development of experimental object recognition systems have had considerable impact on the direction and content of computer vision. Although a plethora of paradigms, algorithms and systems has been proposed over the past two decades, no versatile solution to this problem has been developed; only partial solutions and limited success in constrained environments are the state of the art. In fact, some researchers believe that it is not possible to design an object recognition system that is both functional for a wide variety of scenes and environments and as efficient as a situation-specific system.
The difficulty in obtaining a general and comprehensive solution to this problem can be attributed to the complexity of object recognition in itself, as it involves processing at all levels of machine vision: lower-level vision, as with edge detection and image segmentation; mid-level vision, as with representation and description of pattern shape, and feature extraction; and higher level vision, as with pattern category assignment, matching and reasoning. The success of an object recognition system depends upon succeeding at all these levels. The task is complicated by several factors, such as not knowing how many objects are present in the image, the possibility that the objects may be occluded, the possibility that unknown objects may appear in the image, the motion of the object, and variations in the sensing environment and in the limits and accuracy of the sensor. In applications such as ATR, where targets must be recognized in complex outdoor scenes under adverse conditions, additional factors such as noise, in the form of clutter, and deliberate misinformation, such as camouflage, mislead the recognition system, making the recognition process even more difficult.
In general, recognition is the process of finding a correspondence between certain features in the image and similar features of the object model [2]. The most important issues involved in the process are: (a) identifying the type of features to use, and (b) determining the best procedure to establish the correspondence between image and model features. The reliability and efficiency of an object recognition system directly depends on how carefully these issues are addressed. Generally, recognition follows a bottom–up approach, where features extracted from an image are classified into one of the many object types. However, attempts have also been made to approach the problem using a top–down perspective, where recognition is performed by determining if one of the many known objects appear in the image. The bottom–up approach is adopted in this paper.
View-based object recognition is often referred to as viewer-centered or 2D object recognition, because direct information about the 3D structure of the object (such as a 3D model) is not available; the only a priori information is in the form of representations of the object viewed at different angles (aspects) and distances. Each representation (or characteristic view) describes the object from a single viewpoint, or from a range of viewpoints yielding similar views. There is evidence showing that object recognition in human vision is viewer-centered rather than object-centered [3].
The characteristic views may be obtained by building a database of images of the object or may be rendered from a 3D model of the object [4], [5]. Matching, in this case, is simpler than in model-based recognition because it involves only a 2D/2D comparison. However, the space requirements for representing all of the characteristic views of an object tends to be considerable. Also, the number of model features to search among increases, because each characteristic view can be considered to be a model. Methods to reduce the search space have been addressed by grouping similar views [6], [7], [8].
Broadly speaking, there are two ways to approach this problem. The first is based on matching salient information (e.g. corner points, lines, contours etc.) that has been extracted from the image to the information obtained from the image database [1], [9]. Based on the best match, the object is recognized and its pose estimated. The second approach extracts translation, rotation and scale invariant features (such as moment invariants [10], Zernike moments [11] or Fourier descriptors [12]) from each image and compares them to the features that have been extracted from example images of all the objects. The comparison is usually done in the form of a classification operation [13]. Zernike moments are used for the recognition experiments presented in this paper.
In order to build a system that can succeed in a realistic environment, certain simplifications and assumptions about the environment and the problem are generally made. These simplifications introduce uncertainties into a problem that may create inaccuracies if they are not represented and handled in a suitable manner.
Bayesian statistics have been used at various stages of the object recognition process to provide a firm theoretical footing as well as to improve performance and incorporate error estimates into the recognition problem. The biggest advantage of a Bayesian (or probabilistic) framework is in its ability to incorporate uncertainty elegantly into the recognition process. Bayesian approaches also provide error estimates with their decisions, which give another perspective for analyzing systems. Other advantages of using a Bayesian framework are [14]:
- 1.
Modeling assumptions such as priors, noise distributions, etc. need to be explicitly defined. However, once the assumptions are made, the rules of probability give us a unique answer for any question that is posed.
- 2.
Bayesian inference satisfies the likelihood principle, where decisions are based on data that is seen and not on data that might have occurred but did not.
- 3.
Bayesian model comparison techniques automatically prefer simpler models over complex ones (the Occam's razor principle).
On the other hand, since Bayesian decisions depend heavily on the underlying modeling assumptions, they are very sensitive to the veracity of these assumptions. Usually, a fairly large and representative amount of data is required to get a good description of the model.
Bayesian statistics have been used in object recognition for indexing [15], model matching [16], and incorporating neighborhood relations [17] under different contexts to some degree of success. This paper describes a Bayesian approach to target recognition using target parts.
Section snippets
Previous work: recognition by parts
Object recognition essentially involves the classification of objects into one of many different a priori known object types, and finding the pose of the object once the classification has been done [18], [19]. This paper addresses the problem of object recognition in second-generation 2D FLIR images. (Second generation refers to the better quality of infrared images acquired using recently developed FLIR sensors.) No information is available about the 3D structure (such as a 3D model [18], [20]
Our approach
The recognition system developed in this paper uses target parts for recognition and proposes a hierarchical recognition strategy that uses salient object parts as cues for classification and recognition. The system exploits the advantages of using target parts for recognition, such as reduced model-image feature search space, robust recognition under occlusion, etc.
The use of modular units as the basic building blocks of complex systems [27] has gained popularity. The driving force behind this
A Bayesian hierarchical modular structure
In this section a Bayesian realization of the HMS recognition system described earlier is presented. In the Bayesian system, the expert modules represent the probability density functions of each part, modeled as a mixture of densities to incorporate different views (aspects) of each part. The Expectation–Maximization (EM) approach is used to compute the parameters of these modules. Results obtained for object recognition in second generation FLIR images are also presented in this section.
Part decomposition of target
Underlying the recognition paradigm presented in this paper is an organization of targets and their parts into classes and hierarchies. This section describes the algorithm developed to identify the parts of an object from its 2D image.
Experimental results
This section presents results of using the Bayesian HMS methodology for target recognition in second generation FLIR images. Once the parts of the target are identified using the algorithms described earlier, each part is normalized for translation and rotation and then represented using Zernike moments [11] up to order 8 (i.e. 23 elements) for recognition (i.e. Y), while seven standard moments [10] were used to determine the pose of a part (i.e. Z). Refer to Appendix A for a description of
Conclusions
A hierarchical, modular methodology for object recognition by parts has been presented. Recognition is performed at different levels in the hierarchy; the type of recognition varies from level to level. Each level is made up of expert modules that have been trained to recognize one part of an object.
The effectiveness of the proposed methodology has been demonstrated through a Bayesian realization of the system for the recognition of 2D targets. In this system, each part expert module represents
Acknowledgements
This work was supported in part by the Army Research Office, Contracts DAAH 04-94-G-0417/DAAD-19-99-1-0012 and DAAH 04-95-I-0494.
References (46)
- et al.
Strategies of multi-view multi-matching for 3D object recognition
Computer Vision and Image Processing
(1993) - et al.
Computer vision for robust 3D aircraft recognition with fast library search
Pattern Recognition
(1991) - et al.
Recognition by functional parts
Computer Vision, Graphics and Image Processing
(1995) Human image understanding: recent research and a theory
Computer Vision, Graphics and Image Processing
(1985)- et al.
From volumes to views: An approach to 3-D object recognition
Computer Vision, Graphics and Image Processing: Image Understanding
(1992) - et al.
Design and evolution of modular neural network architectures
Neural Networks
(1994) The ‘module-concept’ in the cerebral cortex architecture
Brain Research
(1975)Biological shape and visual science
Journal of Theoretical Biology
(1973)- et al.
Description and recognition of curved objects
Artificial Intelligence
(1977) Shape description via the use of critical points
Pattern Recognition
(1978)
An iterative procedure for the polygonal approximation of plane curves
Computer Vision, Graphics and Image Processing
Symmetry-curvature duality
Computer Vision, Graphics and Image Processing
CALM: categorizing and learning module
Neural Networks
Computer recognition of partial views of curved objects
IEEE Transactions on Computers
Object Recognition by Computer: The Role of Geometric Constraints
The internal representation of solid shape with respect to vision
Biological Cybernetics
The automatic construction of a view-independent relational model for 3D object recognition
IEEE Transaction on Pattern Analysis and Machine Intelligence
Computing exact aspect graphs of curved objects: algebraic surfaces
International Journal on Computer Vision
Visual pattern recognition by moment invariants
IRE Transactions on Information Theory
Invariant image recognition by Zernike moments
IEEE Transactions on Pattern Analysis and Machine Intelligence
Cited by (21)
Aircraft target detection using multimodal satellite-based data
2019, Signal ProcessingCitation Excerpt :In these works, how to detect required targets effectively and efficiently has become an important task and attracted much attention in recent years. Visual target detection [1–5] has wide applications in face detection, font recognition, and vehicle detection [6–8]. Multitask shared sparse regression, multi-Modal features fusion, multi-task learning and predicting personalized image emotion perceptions methods was wildly used [9–13].
On statistical approaches to target silhouette classification in difficult conditions
2008, Digital Signal Processing: A Review Journal3D target recognition using cooperative feature map binding under Markov Chain Monte Carlo
2006, Pattern Recognition LettersCitation Excerpt :The view-based approach stores all possible target views (Murase and Nayar, 1995). In recent work, each target view is represented as a sum of visual parts (Nair and Aggarwal, 2000; Lowe, 2004). These representations are biologically plausible and suitable for target indexing, but do not provide accurate target information, such as the 3D pose.
Velocity field detection method using the elastic wave of a damaged bolster
2019, International Journal of COMADEMDesign and implementation of DRFCN in-depth network for military target identification
2019, Guangdian Gongcheng/Opto-Electronic EngineeringAircraft target onboard detecting technology via Circular Information Matching method for remote sensing satellite
2015, Proceedings of SPIE - The International Society for Optical Engineering