Integrating support vector machines and neural networks

doi:10.1016/j.neunet.2006.12.003

Neural Networks

Volume 20, Issue 5, July 2007, Pages 590-597

https://doi.org/10.1016/j.neunet.2006.12.003 Get rights and content

Abstract

Support vector machines (SVMs) are a powerful technique developed in the last decade to effectively tackle classification and regression problems. In this paper we describe how support vector machines and artificial neural networks can be integrated in order to classify objects correctly. This technique has been successfully applied to the problem of determining the quality of tiles. Using an optical reader system, some features are automatically extracted, then a subset of the features is determined and the tiles are classified based on this subset.

Introduction

Support vector machines (SVMs) are a novel and popular learning technique for solving different classification problems (Changet al., 2004, Cristianini and Shawe-Taylor, 2000, Vapnik, 1995) and data mining problems in various areas such as image processing, signal processing, pattern recognition, regression and so on.

A classification task usually involves a training set containing “target values” (class labels) and several “attributes” (features). The goal of SVMs is to produce a model that predicts target values for new data instances. More specifically, given a training set of attributes–label pairs $S ≔ {(x^{1}, y_{1}), (x^{2}, y_{2}), \dots, (x^{N}, y_{N})}$ where $x^{i} \in R^{n}$ , $i = 1, \dots, N$ and $y_{i} \in {- 1, 1}$ , $i = 1, \dots, N$ and a function $ϕ : R^{n} \to R^{m}$ , the optimization problem $\begin{matrix} min_{w, b, ξ} & \frac{1}{2} w^{T} w + C \sum_{i = 1}^{N} ξ_{i} \\ subject to & y_{i} (w^{T} ϕ (x^{i}) + b) \geq 1 - ξ_{i}, i = 1, \dots, N \\ ξ_{i} \geq 0, i = 1, \dots, N \end{matrix}$ must be solved in order to obtain the vector $w^{*}$ and the scalar $b^{*}$ . Successively, the classification function $sgn {w^{* T} ϕ (x) + b^{*}}$ is used to discriminate between the two sets of elements.

In Problem (1), the slack variable $ξ$ , will assume nonzero values only in correspondence with points that are misclassified. The objective function has two terms: the first attempts to maximize the distance between the bounding planes, while the other minimizes the classification errors. The parameter $C \geq 0$ is introduced to balance the emphasis between these two goals. A small value of $C$ indicates that most of the importance has to be placed on separating the hyperplanes. A large value of $C$ , on the contrary, indicates that it is important to reduce classification error. Therefore, finding the correct value of $C$ is typically an experimental task (Cherkassky and Ma, 2002, Joachims, 2002, Wahba et al., 2000), accomplished via a training set and cross-validation (Stone, 1974).

In addition to classification, support vector machines are used to effectively solve the feature selection problem. Again, while discriminating between two different classes of data, the most important features that allow separation of the two classes are also selected.

In this paper we present a novel procedure that integrates support vector machines and artificial neural networks in order to solve a specific real-world problem. Using SVMs we select the “best” features that will be used as inputs to the artificial neural networks. In Section 2 the feature selection problem is presented (see Bradley, Mangasarian, and Street (1998) and references therein for a more detailed explanation of the problem) with a parametric objective function and linear constraints. We briefly describe artificial neural networks in Section 3. In Section 4 the problem of determining the quality of tiles is described. Finally, Section 5 contains numerical test results as well as comparisons with two different normalizations.

We briefly describe our notation now. All vectors will be column vectors and will be indicated by a lower case italic letter ( $x, y, \dots$ ). The scalar (inner) product of two vectors $x$ and $y$ in the $n$ -dimensional real space $R^{n}$ will be denoted by $x^{T} y$ . A column vector of ones of arbitrary dimension will be denoted by $e$ . Matrices will be indicated by an upper case italic letter. For a matrix $A$ we will denote the transpose with $A^{T}$ . For a vector $v \in R^{n}$ , $v_{*}$ is the vector with components ${(v_{*})}_{j} = {\begin{cases} 1 & if v_{j} > 0 \\ 0 & otherwise . \end{cases}$

Section snippets

Feature selection via mathematical programming

In this section we discuss the important problem of feature selection (Bennett and Brendensteiner, 1997, John et al., 1994, Kira and Rendell, 1992, Kittler, 1986, Le Cun et al., 1990, Mangasarian, 1996, Stoppiglia et al., 2003). In particular in Stoppiglia et al. (2003), the authors propose a probe feature method that allows one to rank and select features.

The feature selection problem consists in discriminating between two finite sets of points in the $n$ -dimensional feature space using a

Neural network model: The multilayer perceptron

The other tool that we plan to use in our application is Artificial Neural Networks (ANNs) (Bishop, 1995, Hecht-Nielsen, 1989). Based on a biological analogy, artificial neural networks try to emulate the human brain’s ability to learn from examples, from incomplete data and to generalize concepts.

An artificial neural network consists of a pool of simple processing units which communicate by sending signals to each other over a large number of weighted connections.

We will use multilayer feed

Outline of the problem

In this section we describe the real-world application we plan to solve. The goal is to determine the quality of objects using a technique that integrates support vector machines and artificial neural networks. Using support vector machines we select a subset of the features of the objects while the final classification of the object is achieved via an ANN.

The objects we deal with and that we have to inspect are clay tiles of the kind known as “Portuguese” of standard dimension 41 cm×25.5 cm.

Computational results

After the images have been acquired via the prototype vision system and the features have been extracted, a set of 1781 patterns was constructed for training and testing the classification system.

Before solving problem (6), we separated the blobs in just two classes: tiles with cracks and those without cracks but with other defects (i.e., salt-and-pepper and generic not acceptable defects); we have associated with the first set the value of 1, and have given a value of −1 to all remaining blobs.

Conclusions

The main contribution of this paper is the definition of a novel method, in two phases, for addressing classification problems by integrating support vector machines and artificial neural networks. Computational experiments show the effectiveness of this method, leading to excellent results for the problem we considered.

This methodology can be easily applied to different classes of problems (such as financial market forecasting), where again from a large number of features a subset must be

Acknowledgements

The authors would like to thank Luca Girolami from Sigma S.p.A. for providing us with the data from the simulator and for the useful contributions on different aspects discussed in this paper.

References (21)

K.P. Bennett et al.
Feature minimization within decision trees
Computational Optimization and Applications
(1997)
C. Bishop
Neural networks for pattern reconition
(1995)
P.S. Bradley et al.
Feature selection via mathematical programming
INFORMS Journal on Computing
(1998)
R. Capparuccia et al.
A successive overrelaxation back propagation alghorithm for neural network training
IEEE Transaction on Neural Network
(1999)
Chang, C. C., Hsu, C. W., & Lin, C. J. (2004). A pratical guide to support vector classification....
V. Cherkassky et al.
Selection of meta-parameters for support vector regression
N. Cristianini et al.
An introduction to support vector machines
(2000)
GAMS. (2005). General Algebraic Modelling System....
Hecht-Nielsen, R. (1989). Theory of back propagation neural network. In international joint conference on neural...
G.E. Hinton et al.
Learning representations of back-propagation errors
Nature
(1996)

There are more references available in the full text version of this article.

Cited by (22)

Using least square support vector regression with genetic algorithm to forecast beta systematic risk
2015, Journal of Computational Science
Citation Excerpt :
Neural nets, designed to pick up nonlinear patterns from long time-series, are of great interest to researchers. Though ANN has strong parallel processing and fault tolerance ability, the practicability of ANN is less than ideal due to several weaknesses, such as over-fitting, slow convergence velocity, and the problem of easily becoming trapped in local optima [30]. Armstrong and Green [23] suggested that neural networks should be avoided because the method ignores prior knowledge and because the results are difficult to understand.
Since its development, the capital asset pricing model (CAPM) has been extended into all areas of modern corporate finance and investments. The beta systematic risk, which gives the CAPM its power, is the most important factor in measuring and judging risks. The available evidence on the stability of beta indicates that the beta is generally not stable. Hence we face the problem of forecasting future betas in order to use the CAPM. If the portfolio manager cannot predict future beta coefficients, the applicability of this phase of modern capital-market theory is somewhat restricted.
This paper discusses different methods of predicting beta, using the financial statements of 88 Taiwan corporations from the period 2001 to 2010. The experimental results indicate that the most precise forecasts are given by the least square regression algorithm optimized by genetic algorithm.
Shared domains of competence of approximate learning models using measures of separability of classes
2012, Information Sciences
Citation Excerpt :
Recent contributions include new developments in classification assuming fixed probability distributions of the data [38], the use of recursive SVMs to tackle the dimensionality problem [36], the study of formulations of the loss function in order to deal with imbalanced data sets directly [39] or simultaneously selects relevant features during classifier construction [24]. SVMs and ANNs are very related in their foundations and their integration has been already studied [10], and the analysis on the different properties they model and their advantages have been studied [15]. It is well-known that the prediction capabilities of ANNs and SVMs in classification are dependent on the problem’s characteristics.
In this work we jointly analyze the performance of three classic Artificial Neural Network models and one Support Vector Machine with respect to a series of data complexity measures known as measures of separability of classes. In particular, we consider a Radial Basis Function Network, a Multi-Layer Perceptron, a Learning Vector Quantization, while the Sequential Minimal Optimization method is used to model the Support Vector Machine.
We consider five measures of separability of classes over a wide range of data sets built from real data which have proved to be very discriminative when analyzing the performance of classifiers. We find that two of them allow us to extract common behavior patterns for the four learning methods due to their related nature. We obtain rules using these two metrics that describe both good or bad performance of the Artificial Neural Networks and the Support Vector Machine.
With the obtained rules, we characterize the performance of the methods from the data set complexity metrics and therefore their common domains of competence are established. Using these domains of competence the shared good and bad capabilities of these four models can be used to know if the approximative models will perform well or poorly or if a more complex configuration of the model is needed for a given problem in advance.
Hybrid genetic feature selection and support vector machine for prediction LQ45 index in Indonesia stock exchange
2023, AIP Conference Proceedings
Review on Data-driven Method for Property Prediction of Iron and Steel Wear-resistant Materials
2022, Jixie Gongcheng Xuebao/Journal of Mechanical Engineering
Intelligent sales volume forecasting using Google search engine data
2020, Soft Computing
Using market sentiment analysis and genetic algorithm-based least squares support vector regression to predict gold prices
2020, International Journal of Computational Intelligence Systems

View all citing articles on Scopus

View full text

Integrating support vector machines and neural networks

Abstract

Introduction

Section snippets

Feature selection via mathematical programming

Neural network model: The multilayer perceptron

Outline of the problem

Computational results

Conclusions

Acknowledgements

Feature minimization within decision trees

Computational Optimization and Applications

Neural networks for pattern reconition

Feature selection via mathematical programming

INFORMS Journal on Computing

A successive overrelaxation back propagation alghorithm for neural network training

IEEE Transaction on Neural Network

Selection of meta-parameters for support vector regression

An introduction to support vector machines

Learning representations of back-propagation errors

Nature