Integrating support vector machines and neural networks
Introduction
Support vector machines (SVMs) are a novel and popular learning technique for solving different classification problems (Changet al., 2004, Cristianini and Shawe-Taylor, 2000, Vapnik, 1995) and data mining problems in various areas such as image processing, signal processing, pattern recognition, regression and so on.
A classification task usually involves a training set containing “target values” (class labels) and several “attributes” (features). The goal of SVMs is to produce a model that predicts target values for new data instances. More specifically, given a training set of attributes–label pairs where , and , and a function , the optimization problem must be solved in order to obtain the vector and the scalar . Successively, the classification function is used to discriminate between the two sets of elements.
In Problem (1), the slack variable , will assume nonzero values only in correspondence with points that are misclassified. The objective function has two terms: the first attempts to maximize the distance between the bounding planes, while the other minimizes the classification errors. The parameter is introduced to balance the emphasis between these two goals. A small value of indicates that most of the importance has to be placed on separating the hyperplanes. A large value of , on the contrary, indicates that it is important to reduce classification error. Therefore, finding the correct value of is typically an experimental task (Cherkassky and Ma, 2002, Joachims, 2002, Wahba et al., 2000), accomplished via a training set and cross-validation (Stone, 1974).
In addition to classification, support vector machines are used to effectively solve the feature selection problem. Again, while discriminating between two different classes of data, the most important features that allow separation of the two classes are also selected.
In this paper we present a novel procedure that integrates support vector machines and artificial neural networks in order to solve a specific real-world problem. Using SVMs we select the “best” features that will be used as inputs to the artificial neural networks. In Section 2 the feature selection problem is presented (see Bradley, Mangasarian, and Street (1998) and references therein for a more detailed explanation of the problem) with a parametric objective function and linear constraints. We briefly describe artificial neural networks in Section 3. In Section 4 the problem of determining the quality of tiles is described. Finally, Section 5 contains numerical test results as well as comparisons with two different normalizations.
We briefly describe our notation now. All vectors will be column vectors and will be indicated by a lower case italic letter (). The scalar (inner) product of two vectors and in the -dimensional real space will be denoted by . A column vector of ones of arbitrary dimension will be denoted by . Matrices will be indicated by an upper case italic letter. For a matrix we will denote the transpose with . For a vector , is the vector with components
Section snippets
Feature selection via mathematical programming
In this section we discuss the important problem of feature selection (Bennett and Brendensteiner, 1997, John et al., 1994, Kira and Rendell, 1992, Kittler, 1986, Le Cun et al., 1990, Mangasarian, 1996, Stoppiglia et al., 2003). In particular in Stoppiglia et al. (2003), the authors propose a probe feature method that allows one to rank and select features.
The feature selection problem consists in discriminating between two finite sets of points in the -dimensional feature space using a
Neural network model: The multilayer perceptron
The other tool that we plan to use in our application is Artificial Neural Networks (ANNs) (Bishop, 1995, Hecht-Nielsen, 1989). Based on a biological analogy, artificial neural networks try to emulate the human brain’s ability to learn from examples, from incomplete data and to generalize concepts.
An artificial neural network consists of a pool of simple processing units which communicate by sending signals to each other over a large number of weighted connections.
We will use multilayer feed
Outline of the problem
In this section we describe the real-world application we plan to solve. The goal is to determine the quality of objects using a technique that integrates support vector machines and artificial neural networks. Using support vector machines we select a subset of the features of the objects while the final classification of the object is achieved via an ANN.
The objects we deal with and that we have to inspect are clay tiles of the kind known as “Portuguese” of standard dimension 41 cm×25.5 cm.
Computational results
After the images have been acquired via the prototype vision system and the features have been extracted, a set of 1781 patterns was constructed for training and testing the classification system.
Before solving problem (6), we separated the blobs in just two classes: tiles with cracks and those without cracks but with other defects (i.e., salt-and-pepper and generic not acceptable defects); we have associated with the first set the value of 1, and have given a value of −1 to all remaining blobs.
Conclusions
The main contribution of this paper is the definition of a novel method, in two phases, for addressing classification problems by integrating support vector machines and artificial neural networks. Computational experiments show the effectiveness of this method, leading to excellent results for the problem we considered.
This methodology can be easily applied to different classes of problems (such as financial market forecasting), where again from a large number of features a subset must be
Acknowledgements
The authors would like to thank Luca Girolami from Sigma S.p.A. for providing us with the data from the simulator and for the useful contributions on different aspects discussed in this paper.
References (21)
- et al.
Feature minimization within decision trees
Computational Optimization and Applications
(1997) Neural networks for pattern reconition
(1995)- et al.
Feature selection via mathematical programming
INFORMS Journal on Computing
(1998) - et al.
A successive overrelaxation back propagation alghorithm for neural network training
IEEE Transaction on Neural Network
(1999) - Chang, C. C., Hsu, C. W., & Lin, C. J. (2004). A pratical guide to support vector classification....
- et al.
Selection of meta-parameters for support vector regression
- et al.
An introduction to support vector machines
(2000) - GAMS. (2005). General Algebraic Modelling System....
- Hecht-Nielsen, R. (1989). Theory of back propagation neural network. In international joint conference on neural...
- et al.
Learning representations of back-propagation errors
Nature
(1996)
Cited by (22)
Using least square support vector regression with genetic algorithm to forecast beta systematic risk
2015, Journal of Computational ScienceCitation Excerpt :Neural nets, designed to pick up nonlinear patterns from long time-series, are of great interest to researchers. Though ANN has strong parallel processing and fault tolerance ability, the practicability of ANN is less than ideal due to several weaknesses, such as over-fitting, slow convergence velocity, and the problem of easily becoming trapped in local optima [30]. Armstrong and Green [23] suggested that neural networks should be avoided because the method ignores prior knowledge and because the results are difficult to understand.
Shared domains of competence of approximate learning models using measures of separability of classes
2012, Information SciencesCitation Excerpt :Recent contributions include new developments in classification assuming fixed probability distributions of the data [38], the use of recursive SVMs to tackle the dimensionality problem [36], the study of formulations of the loss function in order to deal with imbalanced data sets directly [39] or simultaneously selects relevant features during classifier construction [24]. SVMs and ANNs are very related in their foundations and their integration has been already studied [10], and the analysis on the different properties they model and their advantages have been studied [15]. It is well-known that the prediction capabilities of ANNs and SVMs in classification are dependent on the problem’s characteristics.
Hybrid genetic feature selection and support vector machine for prediction LQ45 index in Indonesia stock exchange
2023, AIP Conference ProceedingsReview on Data-driven Method for Property Prediction of Iron and Steel Wear-resistant Materials
2022, Jixie Gongcheng Xuebao/Journal of Mechanical EngineeringIntelligent sales volume forecasting using Google search engine data
2020, Soft ComputingUsing market sentiment analysis and genetic algorithm-based least squares support vector regression to predict gold prices
2020, International Journal of Computational Intelligence Systems