A novel biologically inspired ELM-based network for image recognition
Introduction
Object recognition has been a popular area of intense research and is also a very challenging task in computer vision, while human vision with unique processing mechanism has the ability to recognize objects rapidly, accurately, and effortlessly. The difficulty of object recognition in images is due to different illuminations, viewpoints, occlusions, scale and shift transforms. Meanwhile, the difficulty of object categorization lies in capturing the variability of appearance and shape of different objects belonging to the same class, while avoiding confusing objects from different classes. Thus in order to achieve robust object recognition, overcoming these obstacles above would be beneficial for many fields and applications, such as security surveillance, manufacturing production, robot navigation, character recognition, and clinical image understanding.
There are many research works done for object recognition. In Treiber׳s book [1], he gives a good overview of object recognition algorithms used in various applications, including global approaches, transformation-search-based methods, geometrical model driven methods, 3D object recognition schemes, flexible contour fitting algorithms, and descriptor-based methods. In the work of Belongie [2], a set of discrete points sampled from the contour of the shape is used as a shape descriptor and then K nearest neighbors (KNN) are used for classification. Mohan [3] built a parts based detector with Haar wavelets to represent the image, and then use a support vector machine (SVM) for classification. Lowe [4] developed an image feature, called scale-invariant feature transform (SIFT), that became the basis for features in many object recognition algorithms, while it is not an object recognition algorithm by itself. Laptev [5] uses histograms as features, weighted Fisher linear discriminant as a weak classifier, and then the AdaBoost for classification. Biologically motivated features based on Gabor filters and MAX operations have been developed [6], [7]. Although the performance of the object recognition has been improved with the above algorithms, none of these algorithms available today can surpass the performance of the human brain. It suggests that more work needs to be done in this field, to solve the multiple problems in robust intelligence that the human brain is so good at, for enhancing the performance of object recognition.
Object recognition in human brain is largely invariant with regard to changes in the size, position, and viewpoint of the object. Therefore, it is perhaps not too surprising that the human brain has achieved, through millions of years of evolution, a remarkable ability to recognize objects in a robust, selective and fast manner. It is likely that, upon understanding how the neuronal circuitries can achieve these remarkable properties, it will be possible to translate the biological circuits into algorithms for computer vision and pattern recognition. A hierarchical cortical based model, named Hierarchical Model and X (HMAX) [8], [9], has attracted much attention, as the fact that it focuses on designing simple and complex operations inspired by the visual cortex, and that in [10], it is shown that the HMAX can provide robust representations of specific images, outperforming state-of-the-art such as SIFT under various invariance tasks on synthetic images. Recently, a novel learning algorithm for single hidden layer feed-forward networks (SLFNs), namely, extreme learning machine (ELM), proposed by Huang et al. [11], can be applied to regression and classification problems [12]. And in [13], it has been successfully applied in the face recognition, which improves the recognition accuracy rate.
In order to improve the performance of image based object recognition, this paper brings together two biologically inspired algorithms, HMAX and ELM, and insights to construct a novel biologically inspired network for image recognition. Since the HMAX features have better scale and translation invariance, the four-layer HMAX model, is employed for feature construction, feature selection and feature extraction, and provides robust feature representation of specific object image. As it has better performance than conventional methods, such as SVM, and it has an extremely fast learning speed, which is akin to the fast learning mechanism of the higher cortical areas, ELM is introduced for feature representation classification. Four groups of experiments will be performed on three datasets, to demonstrate the novelty and superiority of our proposed network over existing algorithms.
The rest of the paper is organized as follows. Section 2 states the problem and strategy of object recognition. In Section 3, preliminary information about HMAX and ELM is presented. Section 4 details the proposed biologically inspired ELM-based image recognition algorithm. In Section 5, several experiments are performed, and followed by results and discussions. The paper is concluded in Section 6.
Section snippets
Problem statement
Drawing on ideas from neurophysiology [14], object recognition is defined as the ability to accurately discriminate each named object (“identification”) or set of objects (“categorization”) from all other possible objects, materials, textures other visual stimuli, and to do this over a range of identity-preserving transformations of the retinal image of that object (e.g. image transformations resulting from changes in object position, distance, and pose).
An image is a visual representation of
Brief of the HMAX
A long-time goal for computer vision has been to build a system that achieves human-level recognition performance. Riesenhuber and Poggio summarized the basic facts about the ventral visual stream, a hierarchy of brain areas thought to mediate object recognition in cortex, and then proposed the HMAX model [8], which is a natural extension of the model of simple to complex cells of Hubel and Wiesel. Serre et al. improved the original HMAX model by adding multi-scale representations as well as
Design inspiration
This section will show the design inspiration of our object recognition network. Because humans and primates outperform the best machine vision systems with respect to almost any measure, building a system that emulates object recognition in cortex or matches with human vision as closely as possible has always been an attractive but elusive goal.
As introduced in Section 1, there exist many image recognition schemes. However, the trade-off between acquired accuracy and computational time poses a
Experiments
In this section, we investigate the performance of the proposed ELM-based recognition algorithm by conducting experiments on the image recognition tasks. We select three image datasets: Fifteen Scenes, DARPA LAGR datasets, and Still Action images. Some image samples from the three datasets are shown in Fig. 5.
- 1.
Fifteen Scenes [26]: The Fifteen Scenes dataset is composed of 15 natural categories of urban and rural scenes for a total of 4885 images.
- 2.
DARPA LAGR datasets [27]: There are six datasets
Conclusion
In this paper, we have proposed a novel biologically inspired image recognition network based on the HMAX and the extreme learning machine. The network consists of five layers: S1–C1–S2–C2–H, to complete the whole object recognition task. The previous four layers focus on the design of feature representation structure, and build simple and complex features based on physiological data about the mammalian visual pathways. The H layer at last pays attention on learning mechanism of the higher
Acknowledgments
This work was supported by the National Natural Science Foundation of China (Grant no. 61005085) and Fundamental Research Funds for the Central Universities (2012QNA4024).
Yu Zhang received the B.S. degree in information engineering from Xi׳an Jiaotong University, Xi׳an, China in 2003, and the M.S. and Ph.D. degrees in computer science from Tsinghua University, Beijing, China in 2009. He was a post-doctor at Tsinghua University from 2009 to 2011 and a visiting scholar at Carnegie Mellon University from 2013 to 2014. He is now a lecturer in the School of Aeronautics and Astronautics at Zhejiang University. His research interests include artificial intelligence,
References (32)
Improving object detection with boosted histograms
Image Vis. Comput.
(2009)- et al.
Face recognition based on extreme learning machine
Neurocomputing
(2011) - et al.
Untangling invariant object recognition
Trends Cognit. Sci.
(2007) - et al.
Discrete-time hypersonic flight control based on extreme learning machine
Neurocomputing
(2014) - et al.
An extensive comparison of recent classification tools applied to microarray data
Comput. Stat. Data Anal.
(2005) - et al.
Enhanced random search based incremental extreme learning machine
Neurocomputing
(2008) - et al.
Extreme learning machinetheory and applications
Neurocomputing
(2006) - et al.
How does the brain solve visual object recognition?
Neuron
(2012) - M.A. Treiber, An Introduction to Object Recognition, Springer, London,...
- et al.
Shape matching and object recognition using shape contexts
IEEE Trans. Pattern Anal. Mach. Intell.
(2002)
Example-based object detection in images by components
IEEE Trans. Pattern Anal. Mach. Intell.
Distinctive image features from scale-invariant keypoints
Int. J. Comput. Vis.
Object class recognition and localization using sparse features with limited receptive fields
Int. J. Comput. Vis.
Hierarchical models of object recognition in cortex
Nat. Neurosci.
Robust object recognition with cortex-like mechanisms
IEEE Trans. Pattern Anal. Mach. Intell.
Cited by (16)
A model for fine-grained vehicle classification based on deep learning
2017, NeurocomputingCitation Excerpt :The process of this model is as Fig. 6. Original image is first fed into a convolutional network which uses VGG16 network structure as depicted in [36] [53]. And then feature maps of the original image will be generated, on which a RPN network is applied to acquire region proposals.
Deep object recognition across domains based on adaptive extreme learning machine
2017, NeurocomputingCitation Excerpt :Many improvements and new applications of ELMs have been proposed by world-wide researchers. The newest work about improved extreme learning machines in deep auto-encoder, local receptive fields for deep learning, transfer learning, and semi-supervised learning have also been proposed [26–30,36–43]. Yang et al. proposed a subnetwork nodes based multilayer ELM framework for representational learning [44].
Energy saving and prediction modeling of petrochemical industries: A novel ELM based on FAHP
2017, EnergyCitation Excerpt :Thus the training speed and the generalization accuracy are high, having strong robustness and not being prone to local optima [19]. Due to these advantages, the ELM has been used in self-organized clustering [20], regression and multiclass classification [21], traffic sign recognition [22], image recognition [23], computer vision processing [24] and feature selection [25]. Cao et al. used the self-adaptive differential evolution algorithm to optimizing the learning parameters of the hidden neuron and obtained an improved self-adaptive evolutionary ELM learning algorithm [26].
Voting based q-generalized extreme learning machine
2016, NeurocomputingCitation Excerpt :Other application-specific ensembles include online learning ensembles of extreme learning machines for predictions of variables in changing environments [20] and ELM ensembles based on average score aggregation for classification of remote sensing images [21]. Recent work has also been done on single classifiers, such as biologically inspired ELM-based networks simulating processing mechanism in primate visual cortex [22], self-organized clustering techniques using ELMs [23], and parsimonious extreme learning machines with sequential partial orthogonalization [24], while others focus on applications in a variety of domains [25–28]. The choice of activation functions may strongly influence performance of neural networks in complex problems.
Textile defect detection using multilevel and attentional deep learning network (MLMA-Net)
2022, Textile Research JournalNovel patch selection based on object detection in HMAX for natural image classification
2022, Signal, Image and Video Processing
Yu Zhang received the B.S. degree in information engineering from Xi׳an Jiaotong University, Xi׳an, China in 2003, and the M.S. and Ph.D. degrees in computer science from Tsinghua University, Beijing, China in 2009. He was a post-doctor at Tsinghua University from 2009 to 2011 and a visiting scholar at Carnegie Mellon University from 2013 to 2014. He is now a lecturer in the School of Aeronautics and Astronautics at Zhejiang University. His research interests include artificial intelligence, intelligent control, computer vision and unmanned aerial vehicles.
Lin Zhang received the B.S. in information and communication engineering from Zhejiang University, China, in 2012. And he is currently working toward the M.S. degree in the School of Aeronautics and Astronautics, Zhejiang University, China. His research interest includes artificial intelligence, computer vision and visual navigation.
Ping Li received the Ph.D. in industrial automation from Zhejiang University, China, in 1988. He was a post-doctor at Zhejiang University from 1988 to 1990. He is now a Professor of the School of Aeronautics and Astronautics and the Department of Control Science and Engineering, Zhejiang University, China. His research interests cover process control, UAV projects and intelligent transportation systems. He focuses on solving practical problems in scientific research work.