Elsevier

Neural Networks

Volume 20, Issue 5, July 2007, Pages 590-597
Neural Networks

Integrating support vector machines and neural networks

https://doi.org/10.1016/j.neunet.2006.12.003Get rights and content

Abstract

Support vector machines (SVMs) are a powerful technique developed in the last decade to effectively tackle classification and regression problems. In this paper we describe how support vector machines and artificial neural networks can be integrated in order to classify objects correctly. This technique has been successfully applied to the problem of determining the quality of tiles. Using an optical reader system, some features are automatically extracted, then a subset of the features is determined and the tiles are classified based on this subset.

Introduction

Support vector machines (SVMs) are a novel and popular learning technique for solving different classification problems (Changet al., 2004, Cristianini and Shawe-Taylor, 2000, Vapnik, 1995) and data mining problems in various areas such as image processing, signal processing, pattern recognition, regression and so on.

A classification task usually involves a training set containing “target values” (class labels) and several “attributes” (features). The goal of SVMs is to produce a model that predicts target values for new data instances. More specifically, given a training set of attributes–label pairs S{(x1,y1),(x2,y2),,(xN,yN)} where xiRn, i=1,,N and yi{1,1}, i=1,,N and a function ϕ:RnRm, the optimization problem minw,b,ξ12wTw+Ci=1Nξisubject toyi(wTϕ(xi)+b)1ξi,i=1,,Nξi0,i=1,,N must be solved in order to obtain the vector w and the scalar b. Successively, the classification function sgn{wTϕ(x)+b} is used to discriminate between the two sets of elements.

In Problem (1), the slack variable ξ, will assume nonzero values only in correspondence with points that are misclassified. The objective function has two terms: the first attempts to maximize the distance between the bounding planes, while the other minimizes the classification errors. The parameter C0 is introduced to balance the emphasis between these two goals. A small value of C indicates that most of the importance has to be placed on separating the hyperplanes. A large value of C, on the contrary, indicates that it is important to reduce classification error. Therefore, finding the correct value of C is typically an experimental task (Cherkassky and Ma, 2002, Joachims, 2002, Wahba et al., 2000), accomplished via a training set and cross-validation (Stone, 1974).

In addition to classification, support vector machines are used to effectively solve the feature selection problem. Again, while discriminating between two different classes of data, the most important features that allow separation of the two classes are also selected.

In this paper we present a novel procedure that integrates support vector machines and artificial neural networks in order to solve a specific real-world problem. Using SVMs we select the “best” features that will be used as inputs to the artificial neural networks. In Section 2 the feature selection problem is presented (see Bradley, Mangasarian, and Street (1998) and references therein for a more detailed explanation of the problem) with a parametric objective function and linear constraints. We briefly describe artificial neural networks in Section 3. In Section 4 the problem of determining the quality of tiles is described. Finally, Section 5 contains numerical test results as well as comparisons with two different normalizations.

We briefly describe our notation now. All vectors will be column vectors and will be indicated by a lower case italic letter (x,y,). The scalar (inner) product of two vectors x and y in the n-dimensional real space Rn will be denoted by xTy. A column vector of ones of arbitrary dimension will be denoted by e. Matrices will be indicated by an upper case italic letter. For a matrix A we will denote the transpose with AT. For a vector vRn, v is the vector with components (v)j={1if vj>00otherwise .

Section snippets

Feature selection via mathematical programming

In this section we discuss the important problem of feature selection (Bennett and Brendensteiner, 1997, John et al., 1994, Kira and Rendell, 1992, Kittler, 1986, Le Cun et al., 1990, Mangasarian, 1996, Stoppiglia et al., 2003). In particular in Stoppiglia et al. (2003), the authors propose a probe feature method that allows one to rank and select features.

The feature selection problem consists in discriminating between two finite sets of points in the n-dimensional feature space using a

Neural network model: The multilayer perceptron

The other tool that we plan to use in our application is Artificial Neural Networks (ANNs) (Bishop, 1995, Hecht-Nielsen, 1989). Based on a biological analogy, artificial neural networks try to emulate the human brain’s ability to learn from examples, from incomplete data and to generalize concepts.

An artificial neural network consists of a pool of simple processing units which communicate by sending signals to each other over a large number of weighted connections.

We will use multilayer feed

Outline of the problem

In this section we describe the real-world application we plan to solve. The goal is to determine the quality of objects using a technique that integrates support vector machines and artificial neural networks. Using support vector machines we select a subset of the features of the objects while the final classification of the object is achieved via an ANN.

The objects we deal with and that we have to inspect are clay tiles of the kind known as “Portuguese” of standard dimension 41 cm×25.5 cm.

Computational results

After the images have been acquired via the prototype vision system and the features have been extracted, a set of 1781 patterns was constructed for training and testing the classification system.

Before solving problem (6), we separated the blobs in just two classes: tiles with cracks and those without cracks but with other defects (i.e., salt-and-pepper and generic not acceptable defects); we have associated with the first set the value of 1, and have given a value of −1 to all remaining blobs.

Conclusions

The main contribution of this paper is the definition of a novel method, in two phases, for addressing classification problems by integrating support vector machines and artificial neural networks. Computational experiments show the effectiveness of this method, leading to excellent results for the problem we considered.

This methodology can be easily applied to different classes of problems (such as financial market forecasting), where again from a large number of features a subset must be

Acknowledgements

The authors would like to thank Luca Girolami from Sigma S.p.A. for providing us with the data from the simulator and for the useful contributions on different aspects discussed in this paper.

References (21)

  • K.P. Bennett et al.

    Feature minimization within decision trees

    Computational Optimization and Applications

    (1997)
  • C. Bishop

    Neural networks for pattern reconition

    (1995)
  • P.S. Bradley et al.

    Feature selection via mathematical programming

    INFORMS Journal on Computing

    (1998)
  • R. Capparuccia et al.

    A successive overrelaxation back propagation alghorithm for neural network training

    IEEE Transaction on Neural Network

    (1999)
  • Chang, C. C., Hsu, C. W., & Lin, C. J. (2004). A pratical guide to support vector classification....
  • V. Cherkassky et al.

    Selection of meta-parameters for support vector regression

  • N. Cristianini et al.

    An introduction to support vector machines

    (2000)
  • GAMS. (2005). General Algebraic Modelling System....
  • Hecht-Nielsen, R. (1989). Theory of back propagation neural network. In international joint conference on neural...
  • G.E. Hinton et al.

    Learning representations of back-propagation errors

    Nature

    (1996)
There are more references available in the full text version of this article.

Cited by (22)

  • Using least square support vector regression with genetic algorithm to forecast beta systematic risk

    2015, Journal of Computational Science
    Citation Excerpt :

    Neural nets, designed to pick up nonlinear patterns from long time-series, are of great interest to researchers. Though ANN has strong parallel processing and fault tolerance ability, the practicability of ANN is less than ideal due to several weaknesses, such as over-fitting, slow convergence velocity, and the problem of easily becoming trapped in local optima [30]. Armstrong and Green [23] suggested that neural networks should be avoided because the method ignores prior knowledge and because the results are difficult to understand.

  • Shared domains of competence of approximate learning models using measures of separability of classes

    2012, Information Sciences
    Citation Excerpt :

    Recent contributions include new developments in classification assuming fixed probability distributions of the data [38], the use of recursive SVMs to tackle the dimensionality problem [36], the study of formulations of the loss function in order to deal with imbalanced data sets directly [39] or simultaneously selects relevant features during classifier construction [24]. SVMs and ANNs are very related in their foundations and their integration has been already studied [10], and the analysis on the different properties they model and their advantages have been studied [15]. It is well-known that the prediction capabilities of ANNs and SVMs in classification are dependent on the problem’s characteristics.

View all citing articles on Scopus
View full text