1 Introduction

Firstly proposed by Vapnik et al., support vector machine (SVM) is a machine learning method which can be applied to solve the binary classification problem [13]. The majority of the scholars have been concerned about it, and it has been applied in many fields in recent years [48]. With the development of the wavelet analysis technology, the researchers expect the wavelet analysis to be applied to SVM because wavelet analysis has excellent properties. In 2004, Zhang et al. combined wavelet analysis with SVM successfully, and they proposed the wavelet support vector machine (WSVM). The experimental results of WSVM show that the wavelet kernel is better than the Gaussian kernel, because wavelet kernel can approximate any functions. WSVM attracted wide attention immediately, and Khatibinia et al. [9] applied wavelet weighted least squares support vector machine to the seismic reliability assessment of RC structures including soil–structure interaction in 2013. Numerical results show that this algorithm has better efficiency and computational advantages in the seismic reliability assessment.

Although the introduction of wavelet kernel improves the performance of SVM greatly, SVM still have some deficiencies and limitations. In order to overcome these shortcomings, Fung and Mangasarian [10] proposed proximal support vector machine (PSVM) in 2001. To make the calculation simpler, PSVM uses the equality constraints instead of the inequality constraints used in traditional SVM. Based on the study of the PSVM, Mangasarian et al. [11] proposed PSVM based on generalized eigenvalues proximal support vector machine (GEPSVM) in 2006. GEPSVM cancels the constraint that the two hyperplanes must be parallel in PSVM. GEPSVM makes each class sample points as close as possible to its hyperplane and as far away as possible from the other class sample points. Thereafter, based on the study of PSVM and GEPSVM, Jayadeva et al. [12] proposed twin support vector machine (TWSVM). TWSVM solves a hyperplane for each class sample points. The two hyperplanes in TWSVM have no constraint on the parallel condition. The binary classification problem is converted to two smaller quadratic programming problems by TWSVM. Because TWSVM has the solid theoretical foundation and the superiority of solving problems, many scholars contribute to the study of TWSVM, and there have been many achievements, for example, Wang et al. [13] proposed A GA-based model selection for smooth twin parametric-margin support vector machine in 2013. They increased the efficiency of TPMSVM from two aspects. First, by introducing a quadratic function, they directly optimized a pair of QPPs of PMSVM in the primal space. It can obviously improve the training speed without loss of generalization. Second, a genetic algorithm GA-based model selection for STPMSVM in the primal space was suggested. In 2013, Chen et al. [14] proposed Laplacian smooth twin support vector machine for semisupervised classification. Rather than solving two QPPs in dual space, they converted the primal constrained QPPs of Lap-TSVM into unconstrained minimization problems (UMPs). Then, a smooth technique was introduced to make these UMPs twice differentiable and a fast Newton–Armijo algorithm was designed to solve the UMPs in Lap-STSVM. In 2013, Qi et al. [15] proposed a new structural twin support vector machine (called S-TWSVM). Unlike existing methods based on structural information, S-TWSVM used two hyperplanes to decide the category of new data, of which each model only considers one class’s structural information. Peng et al. [16] proposed a robust minimum class variance twin support vector machine (RMCV-TWSVM). RMCV-TWSVM effectively overcomes the shortcoming in TWSVM by introducing a pair of uncertain class variance matrices in its objective functions. Huang et al. [17] proposed primal least squares twin support vector regression (PLSTSVR) in 2013. Ding et al. [18] formulated a nonlinear version of the recently proposed LSPTSVM for binary nonlinear classification by introducing nonlinear kernel into LSPTSVM and leaded to a novel nonlinear algorithm, called nonlinear LSPTSVM (NLSPTSVM) in 2014.

However, as the same as SVM, the kernel function selection still affects the performance of TWSVM directly. The common kernel function of TWSVM is Gaussian radial basis kernel function currently. A lot of experimental results demonstrate that the effect of its application in TWSVM is good, but its generalization ability is relatively poor. Wavelet analysis has the characteristics of multivariate interpolation and sparse change, and it is suitable for the analysis of local signals and the detection of transient signals. The wavelet kernel function based on wavelet analysis can approximate any nonlinear functions. From the application of wavelet kernel in SVM, we can see that wavelet kernel has good performance and the application in many fields also has a good effect. Therefore, TWSVM combined with the wavelet analysis technique is worthy of our research and analysis. Based on this, we propose wavelet twin support vector machine (WTWSVM). We use the wavelet kernel function to replace the original Gaussian radial basis kernel function. Shown in the experimental part of this paper, WTWSVM is feasible. It has improved the classification accuracy and generalization ability of TWSVM greatly, and it also expands the range of the kernel function selection in TWSVM.

The rest of this paper is organized as follows: Section II briefly describes the mathematical model of TWSVM. Section III describes the wavelet kernel function and proposes the WTWSVM. Section IV analyzes the experiment results. Finally, we summarize and conclude the paper.

2 Twin support vector machine

2.1 .The mathematical model of TWSVM

We assume that there are l training samples in the space of R n, and they all have n attributes. m 1 samples of them are the positive class, and m 2 samples of them are the negative class. We use the matrix of A(m 1 × n) and the matrix of B(m 2 × n) to represent them, respectively. Finding two non-parallel hyperplanes in the space of R n is the solving process of TWSVM when it is linear separable:

$$x^{T} w_{1} + b_{1} = 0\;{\text{and}}\;x^{T} w_{2} + b_{2} = 0$$
(1)

In the nonlinear separable case, we need to introduce the kernel function K(x T,C T). At this time, the two hyperplanes of TWSVM are as follows:

$$K(x^{T} ,C^{T} )w_{1} + b_{1} = 0\;{\text{and}}\;K(x^{T} ,C^{T} )w_{2} + b_{2} = 0$$
(2)

We construct the solution of these problems by following formulas:

$$\hbox{min} \frac{1}{2}\left\| {K(A,C^{T} )w_{1} + e_{1} b_{1} } \right\|^{2} + c_{1} e_{2}^{T} \zeta$$
(3)
$$s.t - \left( {K\left( {B,C^{T} } \right)w_{1} + e_{2} b_{1} } \right) + \zeta \ge e_{2} ,\zeta \ge 0,$$
(4)
$$\hbox{min} \frac{1}{2}\left\| {K\left( {B,C^{T} } \right)w_{2} + e_{2} b_{2} } \right\|^{2} + c_{2} e_{1}^{T} \zeta$$
(5)
$$s.t - \left( {K\left( {A,C^{T} } \right)w_{2} + e_{1} b_{2} } \right) + \zeta \ge e_{1} ,\zeta \ge 0,$$
(6)

In the above formula, C T = [A B]T, e 1 is the unit column vector which has the same number rows with the kernel function of K(A,C T), e 2 is the unit column vector which has the same number rows with the kernel function of K(B,C T), ξ is the slack vector, A = [x (1)1 ,x (1)2 ,…,x (1) m1 ]T,B = [x (1)1 ,x (1)2 ,…,x (1) m1 ]T, x (i) j represents the jth sample in the ith class, C 1 and C 2 are two penalty parameters.

The distance between the test samples and the hyperplanes determines which class the test samples will be classified as. It means that if:

$$K(x^{T} ,C^{T} )w_{r} + b_{r} = \mathop {\hbox{min} }\limits_{l = 1,2} \left| {K\left( {x^{T} ,C^{T} } \right)w_{l} + b_{l} } \right|$$
(7)

x belongs to the rth class and rϵ{1,2}.

In the two-dimensional case, Fig. 1 visually expresses the basic idea of twin support vector machine. In Fig. 1, the two lines represent the two classified hyperplanes and the purple dots and green dots represent the training points of class 1 and class-1, respectively.

Fig. 1
figure 1

The basic idea of TWSVM

3 Wavelet kernel function and WTWSVM

3.1 Wavelet kernel function

Lemma 1

[19] Let Ψ(x) be a mother wavelet. Let a and b denote the dilation and translation, respectively. If x, z ϵ R d , then the dot-product wavelet kernel is

$$K\left( {x,z} \right) = \prod\limits_{i = 1}^{d} {\psi \left( {\frac{{x_{i} - b_{i} }}{{a_{i} }}} \right)} \psi \left( {\frac{{z_{i} - b_{i} }}{{a_{i} }}} \right)$$
(8)

And the translation-invariant wavelet kernel is

$$K\left( {x,z} \right) = \prod\limits_{i = 1}^{d} {\psi \left( {\frac{{x_{i} - z_{i} }}{{a_{i} }}} \right)}$$
(9)

In order to construct the translation-invariant wavelet kernel function, we select the Mexican Hat wavelet function as the mother wavelet. It is

$$\psi \left( x \right) = \left( {1 - x^{2} } \right)\exp \left( { - \frac{1}{2}x^{2} } \right)$$
(10)

Lemma 2

[20] The Mexican Hat wavelet kernel function that satisfies the translation-invariant kernel conditions is

$$K\left( {x,z} \right) = \prod\limits_{i = 1}^{d} {\left( {1 - \left( {\frac{{x_{i} - z_{i} }}{{a_{i} }}} \right)^{2} } \right)\exp \left( { - \frac{1}{2}\left( {\frac{{x_{i} - z_{i} }}{{a_{i} }}} \right)^{2} } \right)}$$
(11)

Theorem 1

The following formula (12) is also a wavelet kernel function that satisfies the translation-invariant kernel conditions:

$$K\left( {x,z} \right) = \left( {d - \sum\limits_{i = 1}^{d} {\left( {\frac{{x_{i} - z_{i} }}{{a_{i} }}} \right)}^{2} } \right)\exp \left( { - \frac{1}{2}\sum\limits_{i = 1}^{d} {\left( {\frac{{x_{i} - z_{i} }}{{a_{i} }}} \right)}^{2} } \right)$$
(12)

A translation-invariant kernel is an admissible support vector kernel if and only if the Fourier transform is non-negative.

Proof of Theorem 1

According to the formula (12), we can get the Fourier transform of it is

$$F\left[ K \right]\left( \omega \right) = \left( {\frac{1}{2\pi }} \right)^{{ - \frac{d}{2}}} \int_{{R^{d} }} {\exp \left\{ { - j\left( {\left\langle {\omega ,x} \right\rangle } \right)} \right\}} K\left( x \right)dx$$
(13)
$$\begin{aligned} F[K](\omega ) & = \left( {\frac{1}{2\pi }} \right)^{{ - \frac{d}{2}}} \int\limits_{{R^{d} }} {\exp \left\{ { - j\left( {\left\langle {\omega ,x} \right\rangle } \right)} \right\}K\left( x \right)dx} \\ & = \left( {\frac{1}{2\pi }} \right)^{{ - \frac{d}{2}}} \int\limits_{{R^{d} }} {\exp \left\{ { - j\left( {\left\langle {\omega ,x} \right\rangle } \right)} \right\}} \left( {d - \sum\limits_{i = 1}^{d} {\left( {\frac{{x_{i} }}{a}} \right)^{2} } } \right)\exp \left\{ { - \frac{1}{2}\sum\limits_{i = 1}^{d} {\left( {\frac{{x_{i} }}{a}} \right)^{2} } } \right\}dx_{1} { \ldots }dx_{d} \\ & = \left( {\frac{1}{2\pi }} \right)^{{ - \frac{d}{2}}} \left[ {d\prod\limits_{i = 1}^{d} {\int\limits_{R} {\exp \left( { - j\omega_{i} x_{i} } \right)\exp \left( { - \frac{1}{2}\left( {\frac{{x_{i} }}{a}} \right)^{2} } \right)} } } \right]dx_{i} \\ & \; - \sum\limits_{i = 1}^{d} {\int\limits_{{R^{d} }} {\exp \left( { - j\left\langle {\omega \cdot x} \right\rangle } \right)\left( {\frac{{x_{i} }}{a}} \right)^{2} \exp \left\{ { - \frac{1}{2}\sum\limits_{i = 1}^{d} {\left( {\frac{{x_{i} }}{a}} \right)^{2} } } \right\}dx_{1} { \ldots }dx_{d} } } \\ & = d\left( {2\pi } \right)^{\frac{d}{2}} a^{d} e^{{ - \frac{1}{2}\sum\limits_{i = 1}^{d} {\omega_{i}^{2} } }} - \sum\limits_{i = 1}^{d} {\left( {2\pi } \right)}^{\frac{d}{2}} a^{d} e^{{ - \frac{1}{2}\sum\limits_{i = 1}^{d} {\omega_{i}^{2} } }} \left( {1 - \omega_{i}^{2} } \right) \\ & = \left( {2\pi } \right)^{\frac{d}{2}} a^{d} e^{{ - \frac{1}{2}\sum\limits_{i = 1}^{d} {\omega_{i}^{2} } }} \sum\limits_{i = 1}^{d} {\omega_{i}^{2} } \\ \end{aligned}$$

If a ≥ 0, F[K](ω) ≥ 0. So K(x,z) is admissible support vector kernel.□

Figures 2 and 3 are the effect diagrams of Mexican Hat wavelet kernel function at the test point of 0.1 when d = 1, and a has the different values.

Fig. 2
figure 2

The graph of Mexican Hat wavelet kernel function (d = 1 and a with the smaller values) at the test point of 0.1

Fig. 3
figure 3

The graph of Mexican Hat wavelet kernel function (d = 1 and a with the larger values) at the test point of 0.1

From Fig. 2, we can visually see that the learning ability of the Mexican Hat wavelet kernel function is very good, especially when a has smaller values. We can also see that the greater the value of a is the smoother the curve will be when a has greater values from Fig. 2. This indicates that it has a better generalization ability when a has greater values. It is very important to determine the parameters in the actual data. For different data sets, a will have different values. In summary, the wavelet kernel function will have a better learning ability and generalization ability if we can select the appropriate parameters.

3.2 Wavelet twin support vector machine

The traditional TWSVM uses the Gaussian radial basis kernel function. It means that K(X T,C T) in the mathematical model of TWSVM is

$$K\left( {x,x_{i} } \right) = \exp \left( { - \frac{{\left\| {x \cdot x_{i} } \right\|^{2} }}{{2\sigma^{2} }}} \right)$$
(14)

Although the learning ability of the Gaussian radial basis kernel function is very strong, its generalization ability is relatively weak. This directly affects the performance of TWSVM. Many algorithms have optimized the parameters of the Gaussian radial basis kernel function to improve its performance, but they cannot fundamentally solve the kernel function selection problem in TWSVM. Wavelet analysis has the characteristics of multivariate interpolation and sparse change, and it is suitable for the analysis of local signals and the detection of transient signals. The wavelet kernel function based on wavelet analysis can approximate any nonlinear functions. So the combination of the wavelet analysis technique and TWSVM has important significance, and it is worth our study and analysis. Based on this, we propose the WTWSVM. The essence of WTWSVM is the wavelet kernel function instead of the Gaussian radial basis kernel function used in traditional TWSVM. It means that K(X T,C T) in the mathematical model of TWSVM is

$$K\left( {x,z} \right) = \left( {d - \sum\limits_{i = 1}^{d} {\left( {\frac{{x_{i} - z_{i} }}{{a_{i} }}} \right)}^{2} } \right))\exp \left( { - \frac{1}{2}\sum\limits_{i = 1}^{d} {\left( {\frac{{x_{i} - z_{i} }}{{a_{i} }}} \right)}^{2} } \right)$$
(15)

This algorithm introduces the wavelet kernel function into TWSVM, and this makes the combination of wavelet analysis techniques and TWSVM come true. WTWSVM improves the performance of TWSVM fundamentally, and it also expands the range of the kernel function selection in TWSVM to promote the further development of TWSVM.

3.3 The flow of WTWSVM

The algorithm steps of WTWSVM are as follows:

Step 1

Import data sets and divide each date set into two randomly. One is 80 % of the data set which are used for training, and the other is 20 % of the data set which are used for testing.

Step 2

Initialize the relevant parameter values of the algorithm.

Step 3

It takes the 80 % of the data for training. In order to solve the classification plane, the wavelet kernel function maps the data into a high-dimensional feature space to make them become linear separable. Determine the value of a and the value of C 1, C 2 in TWSVM by the grid searching method.

Step 4

Calculate the classification accuracy with the parameter values from Step 3.

Step 5

Determine whether this classification accuracy is the global optimum accuracy. If it is the global optimum accuracy, Jump to the Step 6. If it is not the global optimum accuracy, Jump to the Step 7.

Step 6

Update the global optimum value and record these parameter values as the optimal parameter values.

Step 7

Determine whether it reaches the end condition of the grid cycle. If it does not, Jump to the Step 3. If it does, Jump to the Step 8.

Step 8

Bring the optimal parameters got from the training into TWSVM. And then the final model of WTWSVM is determined.

Step 9

After the algorithm model being determined, take the remaining 20 % of the data for testing to get the test classification accuracy.

Step 10

Stop operations.

The algorithm flowchart of WTWSVM is shown in Fig. 4. By this algorithm flowchart, we can intuitively understand the algorithm process of WTWSVM proposed by this paper. And the ten algorithm steps described above are expressed clearly in Fig. 4. This will help you to understand the WTWSVM.

Fig. 4
figure 4

The algorithm flowchart of WTWSVM

4 The analysis of experimental results

This paper selects nine common data sets in the UCI machine learning database to test and validate the algorithm proposed by this paper. The 80 % of the data will be used for training, and the remaining 20 % of the data will be used for testing. Since we want to verify that the wavelet kernel function will improve the performance of TWSVM, we only do the nonlinear experiments. The nine data sets are ionosphere data set, Australian data set, Pima-Indian data set, Sonar data set, votes data set, Haberman data set, Bupa data set, Wisconsin breast cancer data set and German data set. Using the MATLAB environment, we do these experiments on a PC. In this algorithm, the parameter values are determined by grid searching method, and for different data sets, their values are different.

The characteristics of the nine data sets in the experiment are shown in Table 1.

Table 1 The data characteristics of the data sets

In this paper, we do the experiments on Gaussian radial basis kernel function and wavelet kernel function, respectively. And then we compare their experimental results. By comparison, we can prove that the WTWSVM proposed by this paper is feasible. The experimental results are shown in Table 2.

Table 2 The experimental results

In order to visually observe the experimental results, the experimental results have been graphed to an effect diagram. It is shown in Fig. 5.

Fig. 5
figure 5

The effect diagram of experimental results

From Fig. 5, we can visually see that the classification accuracy curve of WTWSVM in each data set significantly lies above the curve of TWSVM (Gaussian kernel). This clearly shows that the classification accuracy of WTWSVM is higher than TWSVM (Gaussian kernel). Therefore, from Table 2 and Fig. 5, we can have the following conclusion: WTWSVM is feasible, and it improves the performance of TWSVM significantly. The reason for having such a good effect is that WTWSVM proposed by this paper has used the wavelet kernel function. Wavelet analysis has the characteristics of multivariate interpolation and sparse change, and it is suitable for the analysis of local signals and the detection of transient signals. The introduction of wavelet technology improves the classification performance of TWSVM, and the generalization ability of TWSVM also has been improved definitely.

But this experiment still has some shortcomings. It is that we use the grid searching method to find the optimal parameters in this experiment. This method is relatively inefficient, and it often cannot find the optimal parameters. This shortcoming is worthy to be solved in further studies. However, for the advantages of the algorithm proposed by this paper, this problem is negligible. The successful use of the wavelet kernel function in WTWSVM not only improves the classification accuracy and performance of TWSVM, but also expands the range of the kernel function selection in TWSVM. This is beneficial to the further development of TWSVM.

5 Conclusion

The classification algorithm has been developed rapidly in recent years, and the twin support vector machine as an excellent machine learning method is also a research hot spot in the field of machine learning in recent years. But the TWSVM also has some problems and shortcomings. For the kernel function selection problem in TWSVM, we propose the WTWSVM in this paper. This algorithm makes use of the features of wavelet analysis and applies the wavelet analysis techniques to the twin support vector machine. This has greatly improved the performance of twin support vector machine, and it also expands the range of the kernel function selection in TWSVM to further broaden the research direction of TWSVM. It will play a significant role in promoting the further development of TWSVM. However, the parameters of the algorithm in this paper are determined by the grid searching method, so the efficiency of this method is relatively low, and it is difficult to find the optimal parameters. Therefore, in order to further improve the performance of the algorithm, we can start from this point to continue to optimize the algorithm in the next research work to further improve the classification accuracy of TWSVM.