Abstract
FunctionalData Analysis (FDA) has become a very important field in recent years due to its wide range of applications. However, there are several real-life applications in which hybrid functional data appear, i.e., data with functional and static covariates. The classification of such hybrid functional data is a challenging problem that can be handled with the Support Vector Machine (SVM). Moreover, the selection of the most informative features may yield to drastic improvements in the classification rates. In this paper, an embedded feature selection approach for SVM classification is proposed, in which the isotropic Gaussian kernel is modified by associating a bandwidth to each feature. The bandwidths are jointly optimized with the SVM parameters, yielding an alternating optimization approach. The effectiveness of our methodology was tested on benchmark data sets. Indeed, the proposed method achieved the best average performance when compared to 17 other feature selection and SVM classification approaches. A comprehensive sensitivity analysis of the parameters related to our proposal was also included, confirming its robustness.
Similar content being viewed by others
References
Alber M, Zimmert J, Dogan U, Kloft M (2017) Distributed optimization of multi-class svms. Plos One 12(6):1–18
Baesens B (2014) Analytics in a Big Data World. Wiley
Baíllo A, Cuevas A, Fraiman R (2011) Classification methods for functional data
Berrendero J, Justel A, Svarc M (2011) Principal components for multivariate functional data. Comput Stat Data An 55(9):2619–2634
Berrendero J R, Cuevas A, Torrecilla J L (2016) Variable selection in functional data classification: a maxima-hunting proposal. Stat Sin 26:619–638
Bischl B, Lang M, Kotthoff L, Schiffner J, Richter J, Studerus E, Casalicchio G, Jones ZM (2016) mlr: Machine learning in. R. J Mach Learn Res 17(170):1–5
Blanquero R, Carrizosa E, Chis O, Esteban N, Jiménez-Cordero A, Rodríguez JF, Sillero-Denamiel MR (2016) On extreme concentrations in chemical reaction networks with incomplete measurements. Ind Eng Chem Res 55:11417–11430
Blanquero R, Carrizosa E, Jiménez-Cordero A, Rodríguez JF (2016) A global optimization method for model selection in chemical reactions networks. Comput Chem Eng 93:52–62
Blanquero R, Carrizosa E, Jiménez-Cordero A, Martín-Barragán B (2019) Functional-bandwidth kernel for Support Vector Machine with functional data: an alternating optimization algorithm. European J Op Res 275:195–207
Blanquero R, Carrizosa E, Jiménez-Cordero A, Martín-Barragán B (2019) Selection of time instants and intervals with support vector regression for multivariate functional data. Tech. rep., University of Seville - University of Málaga - University of Edinburgh, available at https://www.researchgate.net/publication/327552293_Selection_of_Time_Instants_and_Intervals_with_Support_Vector_Regression_for_Multivariate_Functional_Data
Blanquero R, Carrizosa E, Jiménez-Cordero A, Martín-Barragán B (2019) Variable selection in classification for multivariate functional data. Inform Sci 481:445–462
Boente G, Fraiman R (2000) Kernel-based functional principal components. Stat Probab Lett 48(4):335–345
Bommert A, Sun X, Bischl B, Rahnenführer J, Lang M (2020) Benchmark for filter methods for feature selection in high-dimensional classification data. Comput Stat Data Anal 106839:143
Bradley P, Mangasarian O (1998) Feature selection via concave minimization and support vector machines. In: Machine Learning proceedings of the fifteenth International Conference (ICML’98). San Francisco, California, Morgan Kaufmann, pp 82–90
Bugeau A, Pérez P (2007) Bandwidth selection for kernel estimation in mixed multi-dimensional spaces. Tech. rep., INRIA, available at https://arxiv.org/abs/0709.1920v2
Cai J, Luo J, Wang S, Yang S (2018) Feature selection in machine learning: a new perspective. Neurocomputing 300:70–79
Cai T T, Hall P (2006) Prediction in functional linear regression. Annals Stat 34(5):2159–2179
Carrizosa E, Martín-Barragán B, Romero-Morales D (2014) A nestedheuristic for parameter tuning in support vector machines. Comput Ops Res 43:328–334
Cauwenberghs G, Poggio T (2001) Incremental and decrementalsupport vector machine learning. In: Advances in neural information processing systems, pp 409–415
Chen D, Sain S L, Guo K (2012) Data mining for the online retail industry: a case study of RFM model-based customer segmentation using data mining. J Database Mark Cust Strateg Manag 19(3):197–208
Chen Q, Wynne R, Goulding P, Sandoz D (2000) The application of principal component analysis and kernel density estimation to enhance process monitoring. Control Eng Pract 8(5):531– 543
Chiou J M, Chen Y T, Yang Y F (2014) Multivariate functional principal component analysis: a normalization approach. Stat Sin 24(4):1571–1596
Colson B, Marcotte P, Savard G (2007) An overview of bilevel optimization. Ann Oper Res 153(1):235–256
Cortes C, Vapnik V (1995) Support-vector networks. Mach Learn 20(3):273–297
Cristianini N, Shawe-Taylor J (2000) An introduction to Support Vector Machines and other kernel-based learning methods. Cambridge University Press
Cuesta-Albertos J A, Fraiman R (2007) Impartial trimmed k-means for functional data. Comput Stat Data An 51(10):4864–4877
Cuevas A, Febrero M, Fraiman R (2002) Linear functional regression: the case of fixed design and functional response. Can J Stat 30(2):285–300
Delaigle A, Hall P (2012) Achieving near perfect classification for functional data. J R Stat Soc: Series B Stat Methodol 74(2):267–286
Demšar J (2006) Statisticalcomparisons of classifiers over multiple data sets. J Mach Learn Res 7:1–30
Dheeru D, Karra-Taniskidou E (2017) UCI machine learning repository http://archive.ics.uci.edu/ml
Duan K B, Rajapakse J C, Wang H, Azuaje F (2005) Multiple svm-rfe for gene selection in cancer classification with expression data. IEEE Trans NanoBioscience 4(3):228–234
Duda R (2001) Pattern Classification. Wiley-Interscience Publication, Stork D
Duong T, Cowling A, Koch I, Wand M (2008) Feature significance for multivariate kernel density estimation. Comput Stat Data An 52(9):4225–4242
Fan R E, Chang K W, Hsieh C J, Wang X R, Lin C J (2008) LIBLINEAR: A library for large linear classification. J Mach Learn Res 9:1871–1874
Febrero-Bande M, González-Manteiga W, de la Fuente MO (2017) Variable selection in functional additive regression models. In: Aneiros G, G Bongiorno E, Cao R, Vieu P (eds) Functional statistics and related fields. Springer International Publishing, Cham, pp 113–122
Ferraty F, Vieu P (2006) Nonparametric functional data analysis: theory and practice
García S, Fernández A, Luengo J, Herrera F (2010) Advanced nonparametric tests for multiple comparisons in the design of experiments in computational intelligence and data mining: Experimental analysis of power. Information Sciences 180(10):2044–2064. special Issue on Intelligent Distributed Information Systems
Gaur P, Pachori R B, Wang H, Prasad G (2018) A multi-class EEG-based BCI classification using multivariate empirical mode decomposition based filtering and Riemannian geometry. Expert Syst Appl 95:201–211
Gómez-Verdejo V, Verleysen M, Fleury J (2007) Information-theoreticfeature selection for functional data classification. Neurocomputing Financial Engineering Computational and Ambient Intelligence IWANN 72(16):3580–3589
Gregorutti B, Michel B, Saint-Pierre P (2017) Correlation and variable importance in random forests. Stat Comput 27(3):659–678
Guyon I, Weston J, Barnhill S, Vapnik V (2002) Gene selection for cancer classification using Support Vector Machines. Mach Learn 46(1-3):389–422
Guyon I, Gunn S, Nikravesh M, Zadeh L A (2006) Feature extraction foundations and applications. Springer, Berlin
Hajewski J, Oliveira S, Stewart D (2018) Smoothed hinge loss and ?1 support vector machines. In: 2018 IEEE International Conference on Data Mining Workshops ICDMW, pp 1217–1223
Hall P, Hosseini-Nasab M (2006) On properties of functional principal components analysis. J R Stat Soc: Series B Stat Methodol 68(1):109–126
Hancer E, Xue B, Zhang M (2018) Differential evolution for filter feature selection based on information theory and feature ranking. Knowl-Based Syst 140:103–119
Happ C, Greven S (2018) Multivariate functional principal component analysis for data observed on different dimensional domains. J Am Stat Assoc 113(522):649–659
Hubert M, Rousseeuw P J, Segaert P (2015) Multivariate functional outlier detection. Stat Methods Appl 24(2):177–202
Hubert M, Rousseeuw P, Segaert P (2017) Multivariate and functional classification using depth and distance. ADAC 11(3):445–466
Jacques J, Preda C (2014) Model-based clustering for multivariate functional data. Comput Stat Data An 71:92–106
James G M, Hastie T J (2001) Functional linear discriminant analysis for irregularly sampled curves. J R Stat Soc: Series B Stat Methodol 63(3):533–550
Kadri H, Duflos E, Preux P, Canu S, Davy M (2010) Nonlinearfunctional regression: a functional RKHS approach. In: International Conference on Artificial Intelligence and Statistics, pp 374–380
Kayano M, Dozono K, Konishi S (2010) Functional cluster analysis via orthonormalized gaussian basis expansions and its application. J Classif 27(2):211–230
Ke W, Wu C, Wu Y, Xiong N N (2018) A new filter feature selection based on criteria fusion for gene microarray data. IEEE Access 6:61065–61076
Keerthi S S, Lin C J (2003) Asymptotic behaviors of support vector machines with gaussian kernel. Neural Comput 15(7):1667–1689
Labani M, Moradi P, Ahmadizar F, Jalili M (2018) A novel multivariate filter method for feature selection in text classification problems. Eng Appl Artif Intell 70:25–37
Li B, Yu Q (2008) Classification of functional data: a segmentation approach. Comput Stat Data An 52(10):4790–4800
Li P L, Chiou J M (2011) Identifying cluster number for subspace projected functional data clustering. Comput Stat Data An 55(6):2090–2103
Li W, Lederer J (2019) Tuning parameter calibration for ℓ1-regularized logistic regression. J Stat Plan Infer 202:80–98
López J, Maldonado S (2018) Robust twin support vector regression via second-order cone programming. Knowl-Based Syst 152:83–93
Mafarja M, Mirjalili S (2018) Whale optimization approaches for wrapper feature selection. Appl Soft Comput 62:441–453
Maldonado S, López J (2017) Synchronized feature selection for support vector machines with twin hyperplanes. Knowl-Based Syst 132:119–128
Maldonado S, Weber R, Basak J (2011) Simultaneous feature selection and classification using kernel-penalized support vector machines. Inf Sci 181(1):115–128
Maldonado S, Carrizosa E, Weber R (2015) Kernel penalized k-means: a feature selection method based on kernel k-means. Inf Sci 322:150–160
Maldonado S, Merigó J, Miranda J (2018) Redefining support vector machines with the ordered weighted average. Knowl-Based Syst 148:41–46
Martín-Barragán B, Lillo R, Romo J (2014) Interpretable support vector machines for functional data. Eur J Oper Res 232(1):146–155
Meng Y, Liang J, Qian Y (2016) Comparison study of orthonormal representations of functional data in classification. Knowl-Based Syst 97:224–236
Muñoz A, González J (2010) Representing functional data using support vector machines. Pattern Recogn Lett 31(6):511–516
Muthusankar D, Kalaavathi B, Kaladevi P (2019) High performance feature selection algorithms using filter method for cloud-based recommendation system. Clust Comput 22(1):311–322
Pecha M, Horák D (2020) Analyzing ℓ1 −loss and ℓ2 −loss support vector machines implemented in PERMON toolbox. In: Zelinka I, Brandstetter P, Trong Dao T, Hoang Duy V, Kim S B (eds) Recent advances in electrical engineering and related sciences: theory and application, vol 2018. Springer International Publishing, Cham, pp 13–23
Preda C, Saporta G, Lévéder C (2007) PLS Classification of functional data. Comput Stat 22(2):223–235
Ramsay JO, Silverman BW (2002) Applied functional data analysis: methods and case studies Springer Series in Statistics, vol 77. Springer-Verlag
Ramsay J O, Silverman B W (2005) Functional data analysis, 2nd edn. Springer Series in Statistics, Springer-Verlag
Ratcliffe S J, Heller G Z, Leader L R (2002) Functional data analysis with application to periodically stimulated foetal heart rate data. ii: Functional logistic regression. Stat Med 21(8):1115–1127
Rossi F, Villa N (2006) Support vector machine for functional data classification. Neurocomputing 69(7):730–742
Rossi F, Villa N (2008) Recent advances in the use of SVM for functional data classification. Physica-Verlag HD, Heidelberg, pp 273–280
Sain S R (2002) Multivariate locally adaptive density estimation. Comput Stat Data An 39 (2):165–186
Salaheldin R, El Gayar N (2011) Multiple classifiers for time series classification using adaptive fusion of feature and distance based methods UKCI, vol 2011, p 114
Strle B, Mozina M, Bratko I (2009) Qualitative approximation to dynamic time warping similarity between time series data. In: Proceedings of the Workshop on Qualitative Reasoning
Core Team R (2017) R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria https://www.R-project.org/
Temel T (2017) A new classification algorithm: optimally generalized learning vector quantization (oglvq). Neural Network World 27(6):569–576
Tokushige S, Yadohisa H, Inada K (2007) Crisp and fuzzy k-means clustering algorithms for multivariate functional data. Comput Stat 22(1):1–16
Torrecilla Noguerales J L (2015) On the theory and practice of variable selection for functional data PhD thesis Universidad Autónoma de Madrid
Tubishat M, Abushariah M A M, Idris N, Aljarah I (2019) Improved whale optimization algorithm for feature selection in arabic sentiment analysis. Appl Intell 49(5):1688–1707
Vapnik V (1998) Statistical Learning Theory. Wiley
Wang H, Yao M (2015) Fault detection of batch processes based on multivariate functional kernel principal component analysis. Chemometr Intell Lab Syst 149:78–89
Zou F, Wang Y, Yang Y, Zhou K, Chen Y, Song J (2015) Supervised feature learning via ℓ2 −norm regularized logistic regression for 3D object recognition. Neurocomputing 151:603–611
Acknowledgements
Research partially supported by research grants MTM2015-65915-R (Ministerio de Ciencia e Innovación, Spain), P11-FQM-7603, P18-FR-2369, FQM329 (Junta de Andalucía, Spain), FPU (Ministerio de Educación, Cultura y Deporte), VI PPITUS (Universidad de Sevilla), all with EU ERDF funds, as well as FBBVA-COSECLA. Moreover, thank the team of the Scientific Computing Center of Andalucía (CICA) for the computing services provided. This support is gratefully acknowledged by the first author. The second author would like to thank ANID, FONDECYT project 1200221, and the Complex Engineering Systems Institute (ANID, PIA, FB0816).
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interests
The authors declare that they have no conflict of interest.
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Appendices
Appendix A: Sensitivity analysis
In order to study the robustness of our proposed algorithm with respect to the parameters involved, we ran a sensitivity analysis. We tested how sensitive our methodology is to the regularization parameter C, the threshold at which the features are removed δ, the maximum number of iterations of the alternating approach, and the bandwidths ωv, v = 1,…,p + q. First, we ran five times the alternating approach of Algorithm 1 to test the sensitivity of the algorithm with respect to the parameter C, computing the average accuracy on s3. Second, the sensitivity analysis for the elimination threshold δ is performed by running Algorithm 1 five times for the values given in the set {10− 10,…, 10− 5} in logarithmic scale. The average accuracy is estimated on s3. Third, the maximum number of iterations of the alternating approach may affect the classification rates. In order to check the robustness of our proposal, Algorithm 1 is run five times with the maximum number of iterations belonging to the set {5,…, 10}. The average accuracy measured on the sample s3 is then computed. Finally, we studied the convergence of the bandwidths. Note that in this paper, convergence does not mean that the bandwidths tend to the same value in all the runs, but that they are greater or less than δ, and yield the same features in most of the cases. For each of the five times that Algorithm 1 was run, the optimal values of the bandwidths after the alternating approach were obtained. The goal is to assess the importance of the variables visually. In all the sensitivity analysis studied, the remaining parameters which were not under study took the values given in Section 3.2. For instance, when the sensitivity with respect to C was analyzed, the elimination threshold was equal to 10− 5, and the maximum number of iterations of the alternating approach was set to five. Plots of results of the sensitivity analysis for all the parameters above mentioned in the batch data set are depicted in Figs. 5 and 6. Figures 7 and 8 depict the results for trigonometric data set, whereas the results of the pen data set are shown in Figs. 9 and 10. Finally, Figs. 11 and 12 show the sensitivity analysis of the retail data set.
Appendix B: Analysis of sensitivity, specificity and area under the curve
This section provides three tables with new performance metrics, namely sensitivity (Table 5), specificity (Table 6) and Area under the Curve (Table 7). More details about the conclusions derived from these tables can be seen in Section 3.4.
Rights and permissions
About this article
Cite this article
Jiménez-Cordero, A., Maldonado, S. Automatic feature scaling and selection for support vector machine classification with functional data. Appl Intell 51, 161–184 (2021). https://doi.org/10.1007/s10489-020-01765-6
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10489-020-01765-6