Elsevier

Neurocomputing

Volume 174, Part A, 22 January 2016, Pages 286-298
Neurocomputing

A novel biologically inspired ELM-based network for image recognition

https://doi.org/10.1016/j.neucom.2015.03.117Get rights and content

Abstract

In this paper, a novel biologically inspired network for image recognition has been introduced. The Hierarchical model and X (HMAX) model and the extreme learning machine (ELM) are combined, to construct a five-layer feed-forward network: S1–C1–S2–C2–H. The previous four layers, originating from HMAX, provide robust feature representation of specific object, and the feature classification stage in the H layer is implemented with ELM. The HMAX model simulates the hierarchical processing mechanism in primate visual cortex, to calculate complex features representation. As a biological learning algorithm for generalized SLFNs, ELM learns much faster with good generalization performance, and performs well in classification applications. Four groups of experiments are performed on three datasets, and the results are compared with state-of-the-art techniques. Experimental results show that our proposed network has good performance with fast learning speed.

Introduction

Object recognition has been a popular area of intense research and is also a very challenging task in computer vision, while human vision with unique processing mechanism has the ability to recognize objects rapidly, accurately, and effortlessly. The difficulty of object recognition in images is due to different illuminations, viewpoints, occlusions, scale and shift transforms. Meanwhile, the difficulty of object categorization lies in capturing the variability of appearance and shape of different objects belonging to the same class, while avoiding confusing objects from different classes. Thus in order to achieve robust object recognition, overcoming these obstacles above would be beneficial for many fields and applications, such as security surveillance, manufacturing production, robot navigation, character recognition, and clinical image understanding.

There are many research works done for object recognition. In Treiber׳s book [1], he gives a good overview of object recognition algorithms used in various applications, including global approaches, transformation-search-based methods, geometrical model driven methods, 3D object recognition schemes, flexible contour fitting algorithms, and descriptor-based methods. In the work of Belongie [2], a set of discrete points sampled from the contour of the shape is used as a shape descriptor and then K nearest neighbors (KNN) are used for classification. Mohan [3] built a parts based detector with Haar wavelets to represent the image, and then use a support vector machine (SVM) for classification. Lowe [4] developed an image feature, called scale-invariant feature transform (SIFT), that became the basis for features in many object recognition algorithms, while it is not an object recognition algorithm by itself. Laptev [5] uses histograms as features, weighted Fisher linear discriminant as a weak classifier, and then the AdaBoost for classification. Biologically motivated features based on Gabor filters and MAX operations have been developed [6], [7]. Although the performance of the object recognition has been improved with the above algorithms, none of these algorithms available today can surpass the performance of the human brain. It suggests that more work needs to be done in this field, to solve the multiple problems in robust intelligence that the human brain is so good at, for enhancing the performance of object recognition.

Object recognition in human brain is largely invariant with regard to changes in the size, position, and viewpoint of the object. Therefore, it is perhaps not too surprising that the human brain has achieved, through millions of years of evolution, a remarkable ability to recognize objects in a robust, selective and fast manner. It is likely that, upon understanding how the neuronal circuitries can achieve these remarkable properties, it will be possible to translate the biological circuits into algorithms for computer vision and pattern recognition. A hierarchical cortical based model, named Hierarchical Model and X (HMAX) [8], [9], has attracted much attention, as the fact that it focuses on designing simple and complex operations inspired by the visual cortex, and that in [10], it is shown that the HMAX can provide robust representations of specific images, outperforming state-of-the-art such as SIFT under various invariance tasks on synthetic images. Recently, a novel learning algorithm for single hidden layer feed-forward networks (SLFNs), namely, extreme learning machine (ELM), proposed by Huang et al. [11], can be applied to regression and classification problems [12]. And in [13], it has been successfully applied in the face recognition, which improves the recognition accuracy rate.

In order to improve the performance of image based object recognition, this paper brings together two biologically inspired algorithms, HMAX and ELM, and insights to construct a novel biologically inspired network for image recognition. Since the HMAX features have better scale and translation invariance, the four-layer HMAX model, is employed for feature construction, feature selection and feature extraction, and provides robust feature representation of specific object image. As it has better performance than conventional methods, such as SVM, and it has an extremely fast learning speed, which is akin to the fast learning mechanism of the higher cortical areas, ELM is introduced for feature representation classification. Four groups of experiments will be performed on three datasets, to demonstrate the novelty and superiority of our proposed network over existing algorithms.

The rest of the paper is organized as follows. Section 2 states the problem and strategy of object recognition. In Section 3, preliminary information about HMAX and ELM is presented. Section 4 details the proposed biologically inspired ELM-based image recognition algorithm. In Section 5, several experiments are performed, and followed by results and discussions. The paper is concluded in Section 6.

Section snippets

Problem statement

Drawing on ideas from neurophysiology [14], object recognition is defined as the ability to accurately discriminate each named object (“identification”) or set of objects (“categorization”) from all other possible objects, materials, textures other visual stimuli, and to do this over a range of identity-preserving transformations of the retinal image of that object (e.g. image transformations resulting from changes in object position, distance, and pose).

An image is a visual representation of

Brief of the HMAX

A long-time goal for computer vision has been to build a system that achieves human-level recognition performance. Riesenhuber and Poggio summarized the basic facts about the ventral visual stream, a hierarchy of brain areas thought to mediate object recognition in cortex, and then proposed the HMAX model [8], which is a natural extension of the model of simple to complex cells of Hubel and Wiesel. Serre et al. improved the original HMAX model by adding multi-scale representations as well as

Design inspiration

This section will show the design inspiration of our object recognition network. Because humans and primates outperform the best machine vision systems with respect to almost any measure, building a system that emulates object recognition in cortex or matches with human vision as closely as possible has always been an attractive but elusive goal.

As introduced in Section 1, there exist many image recognition schemes. However, the trade-off between acquired accuracy and computational time poses a

Experiments

In this section, we investigate the performance of the proposed ELM-based recognition algorithm by conducting experiments on the image recognition tasks. We select three image datasets: Fifteen Scenes, DARPA LAGR datasets, and Still Action images. Some image samples from the three datasets are shown in Fig. 5.

  • 1.

    Fifteen Scenes [26]: The Fifteen Scenes dataset is composed of 15 natural categories of urban and rural scenes for a total of 4885 images.

  • 2.

    DARPA LAGR datasets [27]: There are six datasets

Conclusion

In this paper, we have proposed a novel biologically inspired image recognition network based on the HMAX and the extreme learning machine. The network consists of five layers: S1–C1–S2–C2–H, to complete the whole object recognition task. The previous four layers focus on the design of feature representation structure, and build simple and complex features based on physiological data about the mammalian visual pathways. The H layer at last pays attention on learning mechanism of the higher

Acknowledgments

This work was supported by the National Natural Science Foundation of China (Grant no. 61005085) and Fundamental Research Funds for the Central Universities (2012QNA4024).

Yu Zhang received the B.S. degree in information engineering from Xi׳an Jiaotong University, Xi׳an, China in 2003, and the M.S. and Ph.D. degrees in computer science from Tsinghua University, Beijing, China in 2009. He was a post-doctor at Tsinghua University from 2009 to 2011 and a visiting scholar at Carnegie Mellon University from 2013 to 2014. He is now a lecturer in the School of Aeronautics and Astronautics at Zhejiang University. His research interests include artificial intelligence,

References (32)

  • A. Mohan et al.

    Example-based object detection in images by components

    IEEE Trans. Pattern Anal. Mach. Intell.

    (2001)
  • D.G. Lowe

    Distinctive image features from scale-invariant keypoints

    Int. J. Comput. Vis.

    (2004)
  • T. Serre, L. Wolf, T. Poggio, Object recognition with features inspired by visual cortex, in: 2005 IEEE Computer...
  • J. Mutch et al.

    Object class recognition and localization using sparse features with limited receptive fields

    Int. J. Comput. Vis.

    (2008)
  • M. Riesenhuber et al.

    Hierarchical models of object recognition in cortex

    Nat. Neurosci.

    (1999)
  • T. Serre et al.

    Robust object recognition with cortex-like mechanisms

    IEEE Trans. Pattern Anal. Mach. Intell.

    (2007)
  • Cited by (16)

    • A model for fine-grained vehicle classification based on deep learning

      2017, Neurocomputing
      Citation Excerpt :

      The process of this model is as Fig. 6. Original image is first fed into a convolutional network which uses VGG16 network structure as depicted in [36] [53]. And then feature maps of the original image will be generated, on which a RPN network is applied to acquire region proposals.

    • Deep object recognition across domains based on adaptive extreme learning machine

      2017, Neurocomputing
      Citation Excerpt :

      Many improvements and new applications of ELMs have been proposed by world-wide researchers. The newest work about improved extreme learning machines in deep auto-encoder, local receptive fields for deep learning, transfer learning, and semi-supervised learning have also been proposed [26–30,36–43]. Yang et al. proposed a subnetwork nodes based multilayer ELM framework for representational learning [44].

    • Energy saving and prediction modeling of petrochemical industries: A novel ELM based on FAHP

      2017, Energy
      Citation Excerpt :

      Thus the training speed and the generalization accuracy are high, having strong robustness and not being prone to local optima [19]. Due to these advantages, the ELM has been used in self-organized clustering [20], regression and multiclass classification [21], traffic sign recognition [22], image recognition [23], computer vision processing [24] and feature selection [25]. Cao et al. used the self-adaptive differential evolution algorithm to optimizing the learning parameters of the hidden neuron and obtained an improved self-adaptive evolutionary ELM learning algorithm [26].

    • Voting based q-generalized extreme learning machine

      2016, Neurocomputing
      Citation Excerpt :

      Other application-specific ensembles include online learning ensembles of extreme learning machines for predictions of variables in changing environments [20] and ELM ensembles based on average score aggregation for classification of remote sensing images [21]. Recent work has also been done on single classifiers, such as biologically inspired ELM-based networks simulating processing mechanism in primate visual cortex [22], self-organized clustering techniques using ELMs [23], and parsimonious extreme learning machines with sequential partial orthogonalization [24], while others focus on applications in a variety of domains [25–28]. The choice of activation functions may strongly influence performance of neural networks in complex problems.

    View all citing articles on Scopus

    Yu Zhang received the B.S. degree in information engineering from Xi׳an Jiaotong University, Xi׳an, China in 2003, and the M.S. and Ph.D. degrees in computer science from Tsinghua University, Beijing, China in 2009. He was a post-doctor at Tsinghua University from 2009 to 2011 and a visiting scholar at Carnegie Mellon University from 2013 to 2014. He is now a lecturer in the School of Aeronautics and Astronautics at Zhejiang University. His research interests include artificial intelligence, intelligent control, computer vision and unmanned aerial vehicles.

    Lin Zhang received the B.S. in information and communication engineering from Zhejiang University, China, in 2012. And he is currently working toward the M.S. degree in the School of Aeronautics and Astronautics, Zhejiang University, China. His research interest includes artificial intelligence, computer vision and visual navigation.

    Ping Li received the Ph.D. in industrial automation from Zhejiang University, China, in 1988. He was a post-doctor at Zhejiang University from 1988 to 1990. He is now a Professor of the School of Aeronautics and Astronautics and the Department of Control Science and Engineering, Zhejiang University, China. His research interests cover process control, UAV projects and intelligent transportation systems. He focuses on solving practical problems in scientific research work.

    View full text