Evolutionary multi-objective visual cortex for object classification in natural images

https://doi.org/10.1016/j.jocs.2015.10.011Get rights and content

Highlights

  • We proposed a new methodology for image description.

  • We present a multi-objective approach for brain programming.

  • We match the state-of-the-art in classifying GRAZ-01 and outperform it in GRAZ-02.

Abstract

In recent years computer vision systems have used the human visual system as inspiration for solving different tasks such as object detection and classification. Computational models as the artificial visual cortex (AVC) have shown promising results in solving such problems. Thus, this paper proposes a new methodology for creating an image descriptor vector for classification, and at the same time, finding the objects’ location within the image. Also, this work implements the brain programming paradigm from a multi-objective perspective in order to improve the performance in the object classification task. This methodology is implemented for training the proposed model in order to classify the images from the GRAZ-01 and GRAZ-02 databases. The solutions found in this research match, and in some cases outperform, other techniques of the state-of-the-art for classifying the aforementioned databases.

Introduction

Numerous natural systems (brains, immune systems and societies) and artificial systems (parallel and distributed computing, artificial neural networks and evolutionary programs) are generally characterized by behaviors that emerge from non-trivial interactions between a large number of components often based on hierarchical structures [1]. The complexity of understanding and designing such systems while approaching difficult tasks resides in finding the best interactions. In this way, some research communities have focus their efforts in analyzing and creating such systems.

Holland describes a complex adaptive system as the integration of several interdependent entities, that collaborate to solve a given task, and are able to adapt to environmental changes or variations among the parts [2]. The elements that compose such systems are often called agents [3]. A well known example of an adaptive system for classification is the simple pattern recognition device presented in Holland's seminal work [4]. In such example, the complexity resides in the high amount of possible configurations of an array of binary sensors of size a × b, that is 2ab, and the system's necessity in finding the right configuration for recognizing a given pattern. Today there exists examples of this kind of systems that focus on solving computer vision problems.

Sight is one of the most important senses for human beings, since it contributes approximately 70% of the information received by the brain. This information helps in the decision-making process performed during the interactions with the environment. Several scientific communities have focus their research in understanding the organization of the brain with the goal of emulating it. There are several computational models [5], [6], [7], [8], [9], [10], [11], [12], [13], [14] inspired in the hierarchical structure of the human visual system, its neurophysiological characteristics and neuropsychological theories such as: the feature integration theory [15], the biased competition theory [16], the recognition-by-components paradigm [17], the simple and complex cells model [18] and the two path cortical model [19]. These systems approach different visual tasks as object recognition, detection and classification.

Nowadays there are several evolutionary systems that focus in solving the aforementioned visual tasks. For example, Olague and Trujillo describe a Multi-objective Genetic Programming (MOGP) system for synthesizing interest point detectors for view-based object recognition [20]. Shao et al. proposed a feature learning system for classifying the objects in the Caltech-101 database. They evolve a set of two-dimensional operators using MOGP for creating what they call a “near-optimal” image descriptor [21]. Similarly, Al-Sahaf et al. present two GP based methods, the One-shot GP and Compound-GP systems, that aim to evolve a program for the task of binary classification of texture images [22].

A computational model of particular interest for this work is known as the artificial visual cortex (AVC) [23], which in turn is derived from the HMAX model [9], since it approaches the object recognition problem based on the human visual cortex. This model proposed by Olague et al. shows great performance at solving the absence/presence problem for object recognition. The AVC is based primarily on two models: a psychological model called feature integration theory and a neurophysiological model called the two pathway cortical model.

The first theory states that the visual attention task in human beings is performed in two stages. The first one is called the pre-attentive stage, where visual information is processed in parallel over different feature dimensions that compose the scene: shape, color, orientation, spacial frequency, brightness and motion direction. The second stage, called focal attention, integrates the extracted features from the previous stage in order to highlight a region of the scene. Thus, visual attention is the capability of a creature, living or artificial, to focus an object of interest on a visual environment [24]. Visual attention can be formally defined as “the process that establishes a relationship between the different properties in the scene, perceived through the visual system, with the objective of finding the best aspect for solving the task at hand” [25].

The second theory is the two pathway cortical paradigm. This neurophysiological model states that there are two information routes within the visual cortex, the dorsal and ventral streams. Both subsystems receive the same visual information as input, nevertheless they differ in the information transformations performed at each of them [27]. The dorsal stream is mainly related to the spacial detection of objects and visual attention [16], [26]. Additionally, the ventral stream is linked to object recognition and shape representation [19].

The natural visual system generally performs the detection process as part of solving the classification task. However, there is no evidence on which process has greater impact at the moment of processing the visual information. In this work, we focus primarily on the classification task; nevertheless, the idea is to optimize the model using a multi-objective perspective by emulating both processes of the natural system, as a strategy for obtaining better classification results.

The main contribution of this work is the study of object classification from a multi-objective perspective, based on the integration of the single objective approaches developed in [23], [30], while extending the preliminary results published in [31]. In this article the system's implementation is detailed following the SPEA2 algorithm.

This methodology has been implemented for classifying the Person class from the GRAZ-02 database [31], which is part of the European network PASCAL's Visual Objects Classes challenge. Moreover, in this work we optimize the system for classifying other four classes: Bike and Person from GRAZ-01, and Bike and Cars from GRAZ-02. We opted for these five classes since they are used as a testing standard for image classification [29], [11], [14], [13], and also because each evolutionary run requires a considerable computational time as will be explained in Section 3.

As part of the contributions, a cross-validation process is presented in this work, since GP has been criticized for over-training, and such test shows the performances of the methodology that we use to validate the results of the learning strategy that will be explained in the following section. In addition, a frequency of use analysis of the results allows us to observe how often the functions and terminals are applied in the proposed model while processing the GRAZ database.

The remainder of this paper is organized as follows. Section 2 details the stages of our approach using a multi-objective evolutionary framework, where we describe the AVCMO model focusing on the proposed methodology for building the image descriptor and the brain programming algorithm under a multi-objective approach. Then, Section 3 provides the performance of the AVCMO model for image classification of GRAZ-01 and GRAZ-02 classes. Finally, the conclusions for this work are given in Section 4.

Section snippets

Methodology

The AVC model was designed for classifying images regardless of color, orientation, illumination conditions, scale or position of the object of interest [28]. One of its innovations is the way it selects prominent image features in order to build an abstract representation of the object. Hence, the system seeks prominent points in the image in order to build an image descriptor which is later used for classification. For this reason, when processing images where the object of interest occupies

Experiments and results

In this work, we approach the classification problem from a presence/absence perspective. We follow a protocol composed of three steps; the first two define the training stage of the model, while the third one corresponds to the testing phase. Therefore, we need three image sets for the experiments, one per step. This protocol is described next:

  • 1

    Training: this step starts by evaluating each solution with an image set called training; one image descriptor is created per image. Then, these

Conclusions and future work

This paper proposed a methodology for creating an image descriptor vector using the AVCMO model for classification purposes. The system builds the descriptor using visual information taken from images of the object of interest by extracting information exclusively from the image region where the object is located; hence, implicitly finding its location. The AVCMO models were optimized through the evolutionary system called BP following a multi-objective design. The proposed strategy was applied

Acknowledgments

This work was funded by CONACyT México through the research project 155045 – “Evolución de Cerebros Artificiales en Visión por Computadora”. First author supported by scholarship 267339/220773.

Daniel E. Hernández is a Ph.D. candidate in Computer Science at Centro de Investigación Científica y de Educación Superior de Ensenada, B.C., (CICESE), Mexico. He received the M.Sc. Degree in Computer Science in 2011 from CICESE and he holds a Bachelor's degree in Computer Engineering from Universidad Autónoma de Baja California (UABC), Mexico. He is working in the EvoVision research team. His research interest includes computer vision, robotics, evolutionary computation and bio inspired

References (37)

  • K. Fukushima

    Neural network model for selective attention in visual pattern recognition and associative recall

    Appl. Opt.

    (1987)
  • B. Olshausen et al.

    A neurobiological model of visual attention and invariant pattern recognition based on dynamic routing of information

    J. Neurosci.

    (1993)
  • D. Walther et al.

    Attentional selection for object recognition – a gentle way

    Biol. Motiv. Comput. Vis.

    (2002)
  • L. Itti et al.

    A model of saliency-based visual attention for rapid scene analysis

    IEEE Trans. Pattern Anal. Mach. Intell.

    (1998)
  • M. Riesenhuber et al.

    Hierarchical models of object recognition in cortex

    Nat. Neurosci.

    (1999)
  • T. Serre et al.

    Theory of object recognition: computations and circuits in the feedforward path of the ventral stream in primate visual cortex. Technical report

    (2005)
  • J. Mutch et al.

    Object class recognition and localization using sparse features with limited receptive fields

    Int. J. Comput. Vis.

    (2008)
  • H. Wersing et al.

    Learning optimized features for hierarchical models of invariant object recognition

    Neural Comput.

    (2003)
  • Cited by (18)

    • Complex metaheuristics

      2016, Journal of Computational Science
      Citation Excerpt :

      This thematic special issue revolves around the intersection of metaheuristic optimization techniques and complex systems from two different perspectives, namely the use of metaheuristics as a tool for analyzing, modeling or designing complex systems, or the utilization of metaheuristics approaches which are themselves complex systems due to its particular internal structure. We have gathered six papers [13–18] targeted to cover algorithmic and implementation aspects of such complex meta-heuristics in both discrete and continuous domains, as well as applications to complex systems. Some contributions to this thematic special issue are extended versions of results communicated at the EvoCOMPLEX track of the EvoApplications conference [19], held in Copenhagen, 8–10 April 2015 as a part of the EvoStar event.1

    View all citing articles on Scopus

    Daniel E. Hernández is a Ph.D. candidate in Computer Science at Centro de Investigación Científica y de Educación Superior de Ensenada, B.C., (CICESE), Mexico. He received the M.Sc. Degree in Computer Science in 2011 from CICESE and he holds a Bachelor's degree in Computer Engineering from Universidad Autónoma de Baja California (UABC), Mexico. He is working in the EvoVision research team. His research interest includes computer vision, robotics, evolutionary computation and bio inspired algorithms.

    Eddie Clemente received the Ph.D. degree in Computer Science in 2015 and the M.Sc. degree in Computer Science in 2006, both from the Centro de Investigación Científica y de Educación Superior de Ensenada, B.C., (CICESE), Mexico. He holds a Bachelor's degree in Mechatronics Engineering from UPIITA-IPN, Mexico. He is working as a member of the Robotics and Control research team from the Instituto Tecnológico de Ensenada. His research interest includes evolutionary computer vision, robotics and evolutionary computation.

    Gustavo Olague received the Ph.D. degree in Computer Vision, Graphics and Robotics from INPG and INRIA. He is currently a Professor in the Computer Science Department at CICESE in Ensenada. Professor Olague has written over hundred conference and journal papers and co-edited two special issues in Pattern Recognition Letters and Evolutionary Computation, as well as served as co-chair of the Real-World Application track at the Genetic and Evolutionary Computation Conference. Dr. Olague has received numerous distinctions such as the Talbert Abrams award offered by the ASPRS; best paper awards at major conferences like GECCO, EvoIASP, and EvoHOT; and received two times the Bronze Medal at the Human-Competitive awards at GECCO. He is the author of the book Evolutionary Computer Vision published by Springer.

    José L. Briseño received the M.Sc. degree in Electronic Instrumentation and Telecommunications in 1983 from the Scientific Research Center and Higher Education of Ensenada (CICESE), Mexico. He holds a bachelor's degree in Communications and Electronics from the Guadalajara University, Mexico, graduated in 1977. Since 2000 he has been a full professor of artificial intelligence at CICESE, where he is an associate researcher at the EvoVision laboratory from the Computer Science Department. His research focus is machine learning and knowledge processing by ontologies.

    View full text