Automatic feature scaling and selection for support vector machine classification with functional data

Jiménez-Cordero, Asunción; Maldonado, Sebastián

doi:10.1007/s10489-020-01765-6

Automatic feature scaling and selection for support vector machine classification with functional data

Published: 06 August 2020

Volume 51, pages 161–184, (2021)
Cite this article

Applied Intelligence Aims and scope Submit manuscript

969 Accesses
7 Citations
Explore all metrics

Abstract

FunctionalData Analysis (FDA) has become a very important field in recent years due to its wide range of applications. However, there are several real-life applications in which hybrid functional data appear, i.e., data with functional and static covariates. The classification of such hybrid functional data is a challenging problem that can be handled with the Support Vector Machine (SVM). Moreover, the selection of the most informative features may yield to drastic improvements in the classification rates. In this paper, an embedded feature selection approach for SVM classification is proposed, in which the isotropic Gaussian kernel is modified by associating a bandwidth to each feature. The bandwidths are jointly optimized with the SVM parameters, yielding an alternating optimization approach. The effectiveness of our methodology was tested on benchmark data sets. Indeed, the proposed method achieved the best average performance when compared to 17 other feature selection and SVM classification approaches. A comprehensive sensitivity analysis of the parameters related to our proposal was also included, confirming its robustness.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Survey on SVM and their application in image classification

Article 11 January 2018

Mayank Arya Chandra & S. S. Bedi

Practical Bayesian model evaluation using leave-one-out cross-validation and WAIC

Article 30 August 2016

Aki Vehtari, Andrew Gelman & Jonah Gabry

A review of unsupervised feature selection methods

Article 29 January 2019

Saúl Solorio-Fernández, J. Ariel Carrasco-Ochoa & José Fco. Martínez-Trinidad

References

Alber M, Zimmert J, Dogan U, Kloft M (2017) Distributed optimization of multi-class svms. Plos One 12(6):1–18
Google Scholar
Baesens B (2014) Analytics in a Big Data World. Wiley
Baíllo A, Cuevas A, Fraiman R (2011) Classification methods for functional data
Berrendero J, Justel A, Svarc M (2011) Principal components for multivariate functional data. Comput Stat Data An 55(9):2619–2634
MathSciNet MATH Google Scholar
Berrendero J R, Cuevas A, Torrecilla J L (2016) Variable selection in functional data classification: a maxima-hunting proposal. Stat Sin 26:619–638
MathSciNet MATH Google Scholar
Bischl B, Lang M, Kotthoff L, Schiffner J, Richter J, Studerus E, Casalicchio G, Jones ZM (2016) mlr: Machine learning in. R. J Mach Learn Res 17(170):1–5
MathSciNet MATH Google Scholar
Blanquero R, Carrizosa E, Chis O, Esteban N, Jiménez-Cordero A, Rodríguez JF, Sillero-Denamiel MR (2016) On extreme concentrations in chemical reaction networks with incomplete measurements. Ind Eng Chem Res 55:11417–11430
Google Scholar
Blanquero R, Carrizosa E, Jiménez-Cordero A, Rodríguez JF (2016) A global optimization method for model selection in chemical reactions networks. Comput Chem Eng 93:52–62
Google Scholar
Blanquero R, Carrizosa E, Jiménez-Cordero A, Martín-Barragán B (2019) Functional-bandwidth kernel for Support Vector Machine with functional data: an alternating optimization algorithm. European J Op Res 275:195–207
MathSciNet MATH Google Scholar
Blanquero R, Carrizosa E, Jiménez-Cordero A, Martín-Barragán B (2019) Selection of time instants and intervals with support vector regression for multivariate functional data. Tech. rep., University of Seville - University of Málaga - University of Edinburgh, available at https://www.researchgate.net/publication/327552293_Selection_of_Time_Instants_and_Intervals_with_Support_Vector_Regression_for_Multivariate_Functional_Data
Blanquero R, Carrizosa E, Jiménez-Cordero A, Martín-Barragán B (2019) Variable selection in classification for multivariate functional data. Inform Sci 481:445–462
MATH Google Scholar
Boente G, Fraiman R (2000) Kernel-based functional principal components. Stat Probab Lett 48(4):335–345
MathSciNet MATH Google Scholar
Bommert A, Sun X, Bischl B, Rahnenführer J, Lang M (2020) Benchmark for filter methods for feature selection in high-dimensional classification data. Comput Stat Data Anal 106839:143
MathSciNet MATH Google Scholar
Bradley P, Mangasarian O (1998) Feature selection via concave minimization and support vector machines. In: Machine Learning proceedings of the fifteenth International Conference (ICML’98). San Francisco, California, Morgan Kaufmann, pp 82–90
Bugeau A, Pérez P (2007) Bandwidth selection for kernel estimation in mixed multi-dimensional spaces. Tech. rep., INRIA, available at https://arxiv.org/abs/0709.1920v2
Cai J, Luo J, Wang S, Yang S (2018) Feature selection in machine learning: a new perspective. Neurocomputing 300:70–79
Google Scholar
Cai T T, Hall P (2006) Prediction in functional linear regression. Annals Stat 34(5):2159–2179
MathSciNet MATH Google Scholar
Carrizosa E, Martín-Barragán B, Romero-Morales D (2014) A nestedheuristic for parameter tuning in support vector machines. Comput Ops Res 43:328–334
MATH Google Scholar
Cauwenberghs G, Poggio T (2001) Incremental and decrementalsupport vector machine learning. In: Advances in neural information processing systems, pp 409–415
Chen D, Sain S L, Guo K (2012) Data mining for the online retail industry: a case study of RFM model-based customer segmentation using data mining. J Database Mark Cust Strateg Manag 19(3):197–208
Google Scholar
Chen Q, Wynne R, Goulding P, Sandoz D (2000) The application of principal component analysis and kernel density estimation to enhance process monitoring. Control Eng Pract 8(5):531– 543
Google Scholar
Chiou J M, Chen Y T, Yang Y F (2014) Multivariate functional principal component analysis: a normalization approach. Stat Sin 24(4):1571–1596
MathSciNet MATH Google Scholar
Colson B, Marcotte P, Savard G (2007) An overview of bilevel optimization. Ann Oper Res 153(1):235–256
MathSciNet MATH Google Scholar
Cortes C, Vapnik V (1995) Support-vector networks. Mach Learn 20(3):273–297
MATH Google Scholar
Cristianini N, Shawe-Taylor J (2000) An introduction to Support Vector Machines and other kernel-based learning methods. Cambridge University Press
Cuesta-Albertos J A, Fraiman R (2007) Impartial trimmed k-means for functional data. Comput Stat Data An 51(10):4864–4877
MathSciNet MATH Google Scholar
Cuevas A, Febrero M, Fraiman R (2002) Linear functional regression: the case of fixed design and functional response. Can J Stat 30(2):285–300
MathSciNet MATH Google Scholar
Delaigle A, Hall P (2012) Achieving near perfect classification for functional data. J R Stat Soc: Series B Stat Methodol 74(2):267–286
MathSciNet MATH Google Scholar
Demšar J (2006) Statisticalcomparisons of classifiers over multiple data sets. J Mach Learn Res 7:1–30
MathSciNet MATH Google Scholar
Dheeru D, Karra-Taniskidou E (2017) UCI machine learning repository http://archive.ics.uci.edu/ml
Duan K B, Rajapakse J C, Wang H, Azuaje F (2005) Multiple svm-rfe for gene selection in cancer classification with expression data. IEEE Trans NanoBioscience 4(3):228–234
Google Scholar
Duda R (2001) Pattern Classification. Wiley-Interscience Publication, Stork D
Duong T, Cowling A, Koch I, Wand M (2008) Feature significance for multivariate kernel density estimation. Comput Stat Data An 52(9):4225–4242
MathSciNet MATH Google Scholar
Fan R E, Chang K W, Hsieh C J, Wang X R, Lin C J (2008) LIBLINEAR: A library for large linear classification. J Mach Learn Res 9:1871–1874
MATH Google Scholar
Febrero-Bande M, González-Manteiga W, de la Fuente MO (2017) Variable selection in functional additive regression models. In: Aneiros G, G Bongiorno E, Cao R, Vieu P (eds) Functional statistics and related fields. Springer International Publishing, Cham, pp 113–122
Ferraty F, Vieu P (2006) Nonparametric functional data analysis: theory and practice
García S, Fernández A, Luengo J, Herrera F (2010) Advanced nonparametric tests for multiple comparisons in the design of experiments in computational intelligence and data mining: Experimental analysis of power. Information Sciences 180(10):2044–2064. special Issue on Intelligent Distributed Information Systems
Google Scholar
Gaur P, Pachori R B, Wang H, Prasad G (2018) A multi-class EEG-based BCI classification using multivariate empirical mode decomposition based filtering and Riemannian geometry. Expert Syst Appl 95:201–211
Google Scholar
Gómez-Verdejo V, Verleysen M, Fleury J (2007) Information-theoreticfeature selection for functional data classification. Neurocomputing Financial Engineering Computational and Ambient Intelligence IWANN 72(16):3580–3589
Google Scholar
Gregorutti B, Michel B, Saint-Pierre P (2017) Correlation and variable importance in random forests. Stat Comput 27(3):659–678
MathSciNet MATH Google Scholar
Guyon I, Weston J, Barnhill S, Vapnik V (2002) Gene selection for cancer classification using Support Vector Machines. Mach Learn 46(1-3):389–422
MATH Google Scholar
Guyon I, Gunn S, Nikravesh M, Zadeh L A (2006) Feature extraction foundations and applications. Springer, Berlin
MATH Google Scholar
Hajewski J, Oliveira S, Stewart D (2018) Smoothed hinge loss and ?1 support vector machines. In: 2018 IEEE International Conference on Data Mining Workshops ICDMW, pp 1217–1223
Hall P, Hosseini-Nasab M (2006) On properties of functional principal components analysis. J R Stat Soc: Series B Stat Methodol 68(1):109–126
MathSciNet MATH Google Scholar
Hancer E, Xue B, Zhang M (2018) Differential evolution for filter feature selection based on information theory and feature ranking. Knowl-Based Syst 140:103–119
Google Scholar
Happ C, Greven S (2018) Multivariate functional principal component analysis for data observed on different dimensional domains. J Am Stat Assoc 113(522):649–659
MathSciNet MATH Google Scholar
Hubert M, Rousseeuw P J, Segaert P (2015) Multivariate functional outlier detection. Stat Methods Appl 24(2):177–202
MathSciNet MATH Google Scholar
Hubert M, Rousseeuw P, Segaert P (2017) Multivariate and functional classification using depth and distance. ADAC 11(3):445–466
MathSciNet MATH Google Scholar
Jacques J, Preda C (2014) Model-based clustering for multivariate functional data. Comput Stat Data An 71:92–106
MathSciNet MATH Google Scholar
James G M, Hastie T J (2001) Functional linear discriminant analysis for irregularly sampled curves. J R Stat Soc: Series B Stat Methodol 63(3):533–550
MathSciNet MATH Google Scholar
Kadri H, Duflos E, Preux P, Canu S, Davy M (2010) Nonlinearfunctional regression: a functional RKHS approach. In: International Conference on Artificial Intelligence and Statistics, pp 374–380
Kayano M, Dozono K, Konishi S (2010) Functional cluster analysis via orthonormalized gaussian basis expansions and its application. J Classif 27(2):211–230
MathSciNet MATH Google Scholar
Ke W, Wu C, Wu Y, Xiong N N (2018) A new filter feature selection based on criteria fusion for gene microarray data. IEEE Access 6:61065–61076
Google Scholar
Keerthi S S, Lin C J (2003) Asymptotic behaviors of support vector machines with gaussian kernel. Neural Comput 15(7):1667–1689
MATH Google Scholar
Labani M, Moradi P, Ahmadizar F, Jalili M (2018) A novel multivariate filter method for feature selection in text classification problems. Eng Appl Artif Intell 70:25–37
Google Scholar
Li B, Yu Q (2008) Classification of functional data: a segmentation approach. Comput Stat Data An 52(10):4790–4800
MathSciNet MATH Google Scholar
Li P L, Chiou J M (2011) Identifying cluster number for subspace projected functional data clustering. Comput Stat Data An 55(6):2090–2103
MathSciNet MATH Google Scholar
Li W, Lederer J (2019) Tuning parameter calibration for ℓ₁-regularized logistic regression. J Stat Plan Infer 202:80–98
MathSciNet MATH Google Scholar
López J, Maldonado S (2018) Robust twin support vector regression via second-order cone programming. Knowl-Based Syst 152:83–93
Google Scholar
Mafarja M, Mirjalili S (2018) Whale optimization approaches for wrapper feature selection. Appl Soft Comput 62:441–453
Google Scholar
Maldonado S, López J (2017) Synchronized feature selection for support vector machines with twin hyperplanes. Knowl-Based Syst 132:119–128
Google Scholar
Maldonado S, Weber R, Basak J (2011) Simultaneous feature selection and classification using kernel-penalized support vector machines. Inf Sci 181(1):115–128
Google Scholar
Maldonado S, Carrizosa E, Weber R (2015) Kernel penalized k-means: a feature selection method based on kernel k-means. Inf Sci 322:150–160
MathSciNet MATH Google Scholar
Maldonado S, Merigó J, Miranda J (2018) Redefining support vector machines with the ordered weighted average. Knowl-Based Syst 148:41–46
Google Scholar
Martín-Barragán B, Lillo R, Romo J (2014) Interpretable support vector machines for functional data. Eur J Oper Res 232(1):146–155
Google Scholar
Meng Y, Liang J, Qian Y (2016) Comparison study of orthonormal representations of functional data in classification. Knowl-Based Syst 97:224–236
Google Scholar
Muñoz A, González J (2010) Representing functional data using support vector machines. Pattern Recogn Lett 31(6):511–516
Google Scholar
Muthusankar D, Kalaavathi B, Kaladevi P (2019) High performance feature selection algorithms using filter method for cloud-based recommendation system. Clust Comput 22(1):311–322
Google Scholar
Pecha M, Horák D (2020) Analyzing ℓ₁ −loss and ℓ₂ −loss support vector machines implemented in PERMON toolbox. In: Zelinka I, Brandstetter P, Trong Dao T, Hoang Duy V, Kim S B (eds) Recent advances in electrical engineering and related sciences: theory and application, vol 2018. Springer International Publishing, Cham, pp 13–23
Preda C, Saporta G, Lévéder C (2007) PLS Classification of functional data. Comput Stat 22(2):223–235
MathSciNet MATH Google Scholar
Ramsay JO, Silverman BW (2002) Applied functional data analysis: methods and case studies Springer Series in Statistics, vol 77. Springer-Verlag
Ramsay J O, Silverman B W (2005) Functional data analysis, 2nd edn. Springer Series in Statistics, Springer-Verlag
MATH Google Scholar
Ratcliffe S J, Heller G Z, Leader L R (2002) Functional data analysis with application to periodically stimulated foetal heart rate data. ii: Functional logistic regression. Stat Med 21(8):1115–1127
Google Scholar
Rossi F, Villa N (2006) Support vector machine for functional data classification. Neurocomputing 69(7):730–742
Google Scholar
Rossi F, Villa N (2008) Recent advances in the use of SVM for functional data classification. Physica-Verlag HD, Heidelberg, pp 273–280
Google Scholar
Sain S R (2002) Multivariate locally adaptive density estimation. Comput Stat Data An 39 (2):165–186
MathSciNet MATH Google Scholar
Salaheldin R, El Gayar N (2011) Multiple classifiers for time series classification using adaptive fusion of feature and distance based methods UKCI, vol 2011, p 114
Strle B, Mozina M, Bratko I (2009) Qualitative approximation to dynamic time warping similarity between time series data. In: Proceedings of the Workshop on Qualitative Reasoning
Core Team R (2017) R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria https://www.R-project.org/
Temel T (2017) A new classification algorithm: optimally generalized learning vector quantization (oglvq). Neural Network World 27(6):569–576
Google Scholar
Tokushige S, Yadohisa H, Inada K (2007) Crisp and fuzzy k-means clustering algorithms for multivariate functional data. Comput Stat 22(1):1–16
MathSciNet MATH Google Scholar
Torrecilla Noguerales J L (2015) On the theory and practice of variable selection for functional data PhD thesis Universidad Autónoma de Madrid
Tubishat M, Abushariah M A M, Idris N, Aljarah I (2019) Improved whale optimization algorithm for feature selection in arabic sentiment analysis. Appl Intell 49(5):1688–1707
Google Scholar
Vapnik V (1998) Statistical Learning Theory. Wiley
Wang H, Yao M (2015) Fault detection of batch processes based on multivariate functional kernel principal component analysis. Chemometr Intell Lab Syst 149:78–89
Google Scholar
Zou F, Wang Y, Yang Y, Zhou K, Chen Y, Song J (2015) Supervised feature learning via ℓ₂ −norm regularized logistic regression for 3D object recognition. Neurocomputing 151:603–611
Google Scholar

Download references

Acknowledgements

Research partially supported by research grants MTM2015-65915-R (Ministerio de Ciencia e Innovación, Spain), P11-FQM-7603, P18-FR-2369, FQM329 (Junta de Andalucía, Spain), FPU (Ministerio de Educación, Cultura y Deporte), VI PPITUS (Universidad de Sevilla), all with EU ERDF funds, as well as FBBVA-COSECLA. Moreover, thank the team of the Scientific Computing Center of Andalucía (CICA) for the computing services provided. This support is gratefully acknowledged by the first author. The second author would like to thank ANID, FONDECYT project 1200221, and the Complex Engineering Systems Institute (ANID, PIA, FB0816).

Author information

Authors and Affiliations

Group OASYS. Ada Byron Research Building, C/ Arquitecto Francisco Peñalosa, 18, 29010, University of Málaga, Málaga, Spain
Asunción Jiménez-Cordero
Department of Management Control and Information Systems, School of Economics and Business, University of Chile, Santiago, Chile
Sebastián Maldonado
Instituto Sistemas Complejos de Ingeniería (ISCI), Santiago, Chile
Sebastián Maldonado

Authors

Asunción Jiménez-Cordero
View author publications
You can also search for this author in PubMed Google Scholar
Sebastián Maldonado
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Asunción Jiménez-Cordero.

Ethics declarations

Conflict of interests

The authors declare that they have no conflict of interest.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendices

Appendix A: Sensitivity analysis

In order to study the robustness of our proposed algorithm with respect to the parameters involved, we ran a sensitivity analysis. We tested how sensitive our methodology is to the regularization parameter C, the threshold at which the features are removed δ, the maximum number of iterations of the alternating approach, and the bandwidths ω_v, v = 1,…,p + q. First, we ran five times the alternating approach of Algorithm 1 to test the sensitivity of the algorithm with respect to the parameter C, computing the average accuracy on s₃. Second, the sensitivity analysis for the elimination threshold δ is performed by running Algorithm 1 five times for the values given in the set {10^− 10,…, 10^− 5} in logarithmic scale. The average accuracy is estimated on s₃. Third, the maximum number of iterations of the alternating approach may affect the classification rates. In order to check the robustness of our proposal, Algorithm 1 is run five times with the maximum number of iterations belonging to the set {5,…, 10}. The average accuracy measured on the sample s₃ is then computed. Finally, we studied the convergence of the bandwidths. Note that in this paper, convergence does not mean that the bandwidths tend to the same value in all the runs, but that they are greater or less than δ, and yield the same features in most of the cases. For each of the five times that Algorithm 1 was run, the optimal values of the bandwidths after the alternating approach were obtained. The goal is to assess the importance of the variables visually. In all the sensitivity analysis studied, the remaining parameters which were not under study took the values given in Section 3.2. For instance, when the sensitivity with respect to C was analyzed, the elimination threshold was equal to 10^− 5, and the maximum number of iterations of the alternating approach was set to five. Plots of results of the sensitivity analysis for all the parameters above mentioned in the batch data set are depicted in Figs. 5 and 6. Figures 7 and 8 depict the results for trigonometric data set, whereas the results of the pen data set are shown in Figs. 9 and 10. Finally, Figs. 11 and 12 show the sensitivity analysis of the retail data set.

Appendix B: Analysis of sensitivity, specificity and area under the curve

This section provides three tables with new performance metrics, namely sensitivity (Table 5), specificity (Table 6) and Area under the Curve (Table 7). More details about the conclusions derived from these tables can be seen in Section 3.4.

Table 5 Result summary

Full size table

Table 6 Result summary

Full size table

Table 7 Result summary

Full size table

Rights and permissions

Reprints and permissions

About this article

Cite this article

Jiménez-Cordero, A., Maldonado, S. Automatic feature scaling and selection for support vector machine classification with functional data. Appl Intell 51, 161–184 (2021). https://doi.org/10.1007/s10489-020-01765-6

Download citation

Published: 06 August 2020
Issue Date: January 2021
DOI: https://doi.org/10.1007/s10489-020-01765-6

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Automatic feature scaling and selection for support vector machine classification with functional data

Abstract

Access this article

Similar content being viewed by others

Survey on SVM and their application in image classification

Practical Bayesian model evaluation using leave-one-out cross-validation and WAIC

A review of unsupervised feature selection methods

References

Acknowledgements