Abstract
In high dimensional data modeling, Multivariate Adaptive Regression Splines (MARS) is a popular nonparametric regression technique used to define the nonlinear relationship between a response variable and the predictors with the help of splines. MARS uses piecewise linear functions for local fit and apply an adaptive procedure to select the number and location of breaking points (called knots). The function estimation is basically generated via a two-stepwise procedure: forward selection and backward elimination. In the first step, a large number of local fits is obtained by selecting large number of knots via a lack-of-fit criteria; and in the latter one, the least contributing local fits or knots are removed. In conventional adaptive spline procedure, knots are selected from a set of all distinct data points that makes the forward selection procedure computationally expensive and leads to high local variance. To avoid this drawback, it is possible to restrict the knot points to a subset of data points. In this context, a new method is proposed for knot selection which bases on a mapping approach like self organizing maps. By this method, less but more representative data points are become eligible to be used as knots for function estimation in forward step of MARS. The proposed method is applied to many simulated and real datasets, and the results show that it proposes a time efficient forward step for the knot selection and model estimation without degrading the model accuracy and prediction performance.





Similar content being viewed by others
Notes
Here, for simplicity, the weight vectors associated with BMUs are projected to the nearest original data point. But other projection approaches may also be used.
The code is available on request from the first author.
References
Akaike, H.: Information theory and an extension of the maximum likelihood principle. In: Petrov, B.F, Cs’aki, F. (eds.) 2nd International Symposium on Information Theory. Academiai Kiado, Budapest (1973)
Aldrin, M.: Improved predictions penalizing both slope and curvature in additive models. Computational Statistics and Data Analysis 50, 267–284 (2006)
Atilgan, T.: Basis Selection for Density Estimation and Regression. AT &T Bell Laboratories Technical Memorandum (1988)
Cervellera, C., Chen, V.C.P., Wen, A.: Neural network and regression spline value function approximations for stochastic programming. Comput. Oper. Res. 34, 70–90 (2006)
Chen, V.C.P.: Measuring the goodness of orthogonal array discretization for stochastic programming and stochastic dynamic programming. SIAM J. Optim. 12(2), 322–344 (2001)
Chen, V.C.P., Gunther, D., Johnson, E.L.: Solving for an optimal airline yield management policy via statistical learning. J. R. Stat. Soc. Ser. C 52(1), 1–12 (2003)
Chen, V.C.P., Ruppert, D., Shoemaker, C.A.: Applying experimental design and regression splines to high-dimensional continuous-state stochastic dynamic programming. Oper. Res. 47, 38–53 (1999)
Cortez, P., Cerdeira, A., Almeida, F., Matos, T., Reis, J.: Modeling wine preferences by data mining from physicochemical properties. Decision Support Systems 47(4), 547–553 (2009)
Craven, P., Wahba, G.: Smoothing noisy data with spline functions: estimating the correct degree of smoothing by the method of generalized cross validation. Numerische Mathematik 31, 377–403 (1979)
Crino, S., Brown, D.E.: Global optimization with multivariate adaptive regression splines. IEEE Trans. Syst. Man Cybern. Part B: Cybern. 37(2), 333–340 (2007)
De Boor, C.: A Practical to Guide to Splines. Springer, New York (1978)
Deichmann, J., Eshghi, A., Haughton, D., Sayek, S., Teebagy, N.: Application of multiple adaptive regression splines (MARS) in direct response modeling. J. Direct Market. 16, 15–27 (2002)
Denison, D.G.T., Mallick, B.K., Smith, A.F.M.: Automatic bayesian curve fitting. J. R. Stat. Soc. 60, 333–350 (1998)
Eilers, Paul H.C., Marx, B.D.: Flexible smoothing with B-splines and penalties. Stat. Sci. 11, 98–102 (1996)
Fox, J.: Nonparametric Regression, an R and S-PLUS Companion to Applied Regression. Sage, Thousand Oaks (2002)
Frank, A., Asuncion, A.: UCI Machine Learning Repository. University of California, School of Information and Computer Science, Irvine, CA. http://archive.ics.uci.edu/ml (2010)
Friedman, J.H., Silverman, B.W.: Flexible parsimonious smoothing and additive modeling. Technometrics 31, 3–21 (1989)
Friedman, J.H.: Multivariate adaptive regression splines. Ann. Stat. 19, 1–67 (1991)
Friedman, J.H.: Fast MARS. Stanford University Department of Statistics, Technical Report 110 (1993)
Gersho, A., Gray, R.M.: Vector Quantization and Signal Compression. Kluwer, Norwell (1992)
Gibbons, J.D., Chakraborti, S.: Nonparametric Statistical Inference. Marcel Dekker, New York (2003)
Green, P.H., Silverman, B.W.: Nonparametric Regression and Generalized Linear Models. Chapman Hall, Boca Raton (1994)
Hastie, T.J., Tibshirani, R.J., Friedman, J.: The Elements of Statistical Learning, Data Mining, Inference and Prediction. Springer, New York (2001)
Haykin, S.: Neural Networks: A Comprehensive Foundation. Prentice-Hall, New Jersey (1999)
Jekabsons, G.: ARESLab: Adaptive Regression Splines Toolbox for matlab/Octave. http://www.cs.rtu.lv/jekabsons/ (2011)
Jin, R., Chen, W., Simpson, T.W.: Comparative studies of metamodeling techniques under multiple modeling criteria. Struct. Multidiscip. Optim. 23, 1–13 (2001)
Kohonen, T.: Self-organizing and Associative Memory. Springer, New York (1988)
Kubin, G.: Nonlinear prediction of mobile radio channels: Measurments and mars model designs. In: IEEE Proceedings of the International Conference on Acoustics, Speech, and Signal Processing, March 15–19, pp. 2667–2670 (1999)
Leathwick, J.R., Rowe, D., Richardson, J., Elith, J., Hastie, T.: Using multivariate adaptive regression splines to predict the distributions of New Zealand’s freshwater diadromous fish. Freshw. Biol. 50, 2034–2052 (2005)
Leathwick, J.R., Elith, J., Hastie, T.: Comparative performance of generalized additive models and multivariate adaptive regression splines for statistical modelling of species distributions. Ecol. Model. 199, 188–196 (2006)
Lee, T.S., Chiu, C.C., Chou, Y.C., Lu, C.J.: Mining the customer credit using classifcation and regression tree and multivariate adaptive regression splines. Comput. Stat. Data Anal. 50, 1113–1130 (2006)
Luo, Z., Wahba, G.: Hybrid adaptive splines. J. Am. Stat. Assoc. 92, 107–116 (1997)
Mallows, C.L.: Some comments on Cp. Technometrics 15, 661–675 (1973)
Martinez, N., Martinez, D., Rosenberger, J.M., Chen, V.C.P.: Global Optimization for a Piecewise Linear Regression Spline Function (INFORMS Meeting, 2011)
Miyata, S., Shen, X.: Free-knot splines and adaptive knot selection. J. Jpn. Stat. Soc. 35(2), 303–324 (2005)
Özmen, A.: Robust Conic Quadratic Programming Applied to Quality Improvement—A Robustification of cmars, MSc. Middle East Technical University (2010)
Özmen, A., Weber, G.-W., Batmaz, İ.: The new robust CMARS (RCMARS) method. In: 24th Mini EURO Conference-on Continuous Optimization and Information-Based Technologies in the Financial Sector, MEC EurOPT Selected Papers, ISI Proceedings, pp. 362–368 (2010)
Pilla, V.L., Rosenberger, J.M., Chen, V.C.P., Engsuwan, N., Siddappa, S.: A multivariate adaptive regression splines cutting plane approach for solving a two-stage stochastic programming fleet assignment model. Eur. J. Oper. Res. 216, 162–171 (2012)
Pilla, V., Rosenberger, J., Chen, V.: A statistical computer experiments approach to airline fleet assignment. IIE Trans. 40, 524–537 (2008)
Quinlan, R.: Combining instance-based and model-based learning. In Proceedings on the Tenth International Conference of Machine Learning, University of Massachusetts, Amherst. Morgan Kaufmann. 236–243 (1993)
Sakamoto, W.: MARS: selecting basis functions and knots with an empirical Bayes method. Comput. Stat. 22, 583–597 (2007)
Stone, C.J., Koo, CY.: Additive splines in statistics. In: Proceeding of the Statistical Computing Section, pp. 45–48. American Statistical Association, Alexandria, VA (1985)
Stone, C., Hansen, M., Kooperberg, C., Troung, Y.: Polynomial splines and their tensor products in extended linear modeling. Ann. Stat. 25, 1371–1470 (1997)
Tsai, J.C.C., Chen, V.C.P., Chen, J., Beck, M.B.: Stochastic dynamic programming formulation for a wastewater treatment decision-making framework. Ann. Oper. Res. 132, 207–221 (2004)
Tsanas, A., Little, M.A., McSharry, P.E., Ramig, L.O.: Accurate telemonitoring of Parkinson’s disease progression by non-invasive speech tests. IEEE Transactions on Biomedical Engineering 57(4), 884–894 (2010)
Vesanto, J., Himberg, J., Alhoniehmi, E., Parhankangas, J.: SOM toolbox for Mathlab 5, Report A57. http://www.cis.hut.fi//projects/somtoolbox/ (2000)
Wahba, G.: In discussion to. J. Ramsay: monotone regression splines in action. Stat. Sci. 3, 425–462 (1988)
Weber, G.W., Batmaz, İ., Köksal, G., Taylan, P., Yerlikaya-Özkurt, F.: CMARS: a new contribution to nonparametric regression with multivariate adaptive regression splines supported by continuous optimization. Inverse Probl. Sci. Eng. 20(3), 371–400 (2011)
Wong, C., Kohn, R.: A Bayesian approach to additive semi-parametric regression. J. Econom. 74, 209–235 (1996)
Yeh, I.-C.: Modeling of strength of high performance concrete using artificial neural networks. Cement and Concrete Research 28(12), 1797–1808 (1998)
Yerlikaya, F.: A New Contribution to Nonlinear Robust Regression and Classification with MARS and Its Application to Data Mining for Quality Control in Manufacturing, MSc.Thesis at the Institute of Applied Mathematics of METU, Ankara (2008)
York, T.P., Eaves, L.J., Van den Oord, E.J.C.G.: Multivariate adaptive regression splines: a powerful method for detecting disease-risk relationship differences among subgroubs. Stat. Med. 25(8), 1355–1367 (2006)
Author information
Authors and Affiliations
Corresponding author
Appendices
Appendix A
1.1 Mathematical formulations for data generation
-
\(P_{1}. f(x) = \sum \limits _{i=1}^7 {\left[ {\left( {\ln \left( {x_i -2} \right) } \right) ^{2}+\left( {\ln \left( {10-x_i } \right) } \right) ^{2}} \right] -\left( {\prod \limits _{i=1}^7 {x_i } } \right) ^{2}} 2.1\le x_i \le 9.9\)
-
\(P_{2}. f(x) = \sum \limits _{j=1}^{10} {\exp ({x_j})\left( {c_j +x_j -\ln \left( {\sum \limits _{k=1}^{10} {\exp \left( {x_k}\right) }} \right) } \right) }\),
-
\(\hbox {cj} = -6.089, -17. 164, -34.054, -5.914, -24.721, -14.986, -24.100, -10.708, -26.662, -22.179\)
-
\(P_{3}. f(x) = x_1^2 +x_2^2 +x_1 x_2 -14x_1 -16x_2 +\left( {x_3 -10} \right) ^{2}-4\left( {x_4 -5} \right) ^{2}+\left( {x_5 -3} \right) ^{2} +2\left( {x_6 -1} \right) ^{2} +5x_7^2 +7\left( {x_8 -11} \right) ^{2}+2\left( {x_9 -10} \right) ^{2}+\left( {x_{10} -7} \right) ^{2}+45\)
-
\(P_{4}. f(x) = \sin \left( {\pi x_1 /12} \right) \cos \left( {\pi x_2 /16} \right) \)
-
\(P_{5}. f(x) = \left( {x_1 -1} \right) ^{2}+\left( {x_1 -x_2 } \right) ^{2}+\left( {x_2 -x_3 } \right) ^{4}\)
-
\(P_{6}. f(x) = 5.3578547x_2^2 +0.8356891x_1 x_3 +37.293239x_1 -40792.141\)
-
\(P_{8}. f(x) = 3(1-x)^{2}\exp \left( {-x^{2}-(y+1)^{2}} \right) -10\left( {x/5-x^{3}-y^{5}} \right) \exp \left( {-x^{2}-y^{2}} \right) -1/{3\exp \left( {-(x+1)^{2}-y^{2}} \right) +2x}\)
Appendix B
1.1 Grid plots of mathematical functions

Appendix C
1.1 Root mean squared error (RMSE)
RMSE indicates the grossly inaccurate estimates. The smaller RMSE, the better it is.
where, \(n\) is the number of observation, \(y_i \) is an \(i\)th observed response value and \(\hat{{y}}_i \) is the \(i\)th fitted response.
1.2 Adjusted-multiple coefficient of determination (Adj-R\(^{2})\):
The value of coefficient of determination (\(\hbox {R}^{2})\) indicates how much variation in response is explained by the model. \(\hbox {Adj-R}^{2}\) is penalized form of \(\hbox {R}^{2}\) with respect to the number of predictors in model. The higher the \(\hbox {Adj-R}^{2}\), the better the model fits the data.
where \(\left( {n-p-1} \right) \ne 0;\, p\) is the number of terms in the model, \(y_i \) is an \(i\)th observed response value; \(\bar{{y}}_i \) is the mean response and \(\hat{{y}}_i \) is the \(i\)th fitted response.
1.3 Stability
where \(MR_{TR} \) and \(MR_{TE} \) represents the performance measures (RMSE or \(\hbox {Adj-R}^{2})\) for the test and training data sets, respectively. The model whose stability measure is close to one represents a stable model.
1.4 Generalized cross validation (GCV)
where, \(y_i \) is the \(i\)th observed response value; \(\hat{{f}}_M (\mathbf{x}_i )\) is the fitted response value obtained for the \(i\)th observed predictor vector \(\mathbf{x}_i =(x_{i,1},\ldots ,x_{i,p} )^{T}\;(i=1,\ldots ,n), \quad n\) is the number of data points; \(M\) represents the maximum number of BFs in the model.
The value \(P(M)\) is the effective number of parameters which is a penalty measure for complexity. Here, \(P(M)=r+dN,\) where \(r\) is the number of linearly independent BFs in the model, and \(N\) is the number of knots selected in the forward process. Note that if the model is additive then \(d\) is taken to be two; if the model is an interaction model then \(d= 3\).
Appendix D
1.1 Red Wine quality
The dataset is related to red type of the Portuguese “Vinho Verde” wine [8]. The inputs include objective tests (e.g. PH values) and the output is based on sensory data (median of at least 3 evaluations made by wine experts). Input variables (based on physicochemical tests) are fixed acidity, volatile acidity, citric acid, residual sugar, chlorides, free sulfur dioxide, total sulfur dioxide, density, pH, sulphates, and alcohol. There is no missing attribute value in data.
1.2 Concrete compressive strength data set
Concrete is the most important material in civil engineering. The concrete compressive strength is a highly nonlinear function of age and ingredients. These ingredients include cement, blast furnace slag, fly ash, water, super plasticizer, coarse aggregate, and fine aggregate [50]. There is no missing attribute value in data.
1.3 Communities and crime
The data combines socio-economic data from the 1990 US Census, law enforcement data from the 1990 US LEMAS survey, and crime data from the 1995 FBI UCR. The variables included in the dataset involve the community, such as the percent of the population considered urban, and the median family income, and involving law enforcement, such as per capita number of police officers, and percent of officers assigned to drug units. The per capita violent crimes variable was calculated using population and the sum of crime variables considered violent crimes in the United States: murder, rape, robbery, and assault. It should be noted here that the Communities and Crime dataset is reduced in both size and scale to be appropriate for our analysis.
1.4 Auto-Mpg data
The data concerns city-cycle fuel consumption in miles per gallon to be predicted in terms of 3 multi-valued discrete and 5 continuous attributes including cylinders, displacement, horsepower, weight, acceleration, model year and origin [40]. Data has 6 missing values.
1.5 Parkinsons telemonitoring
This dataset is composed of a range of biomedical voice measurements from 42 people with early-stage Parkinson’s disease recruited to a six-month trial of a telemonitoring device for remote symptom progression monitoring. The recordings were automatically captured in the patient’s homes. Columns in the table contain subject number, subject age, subject gender, time interval from baseline recruitment date, motor UPDRS, total UPDRS, and 16 biomedical voice measures. Each row corresponds to one of 5,875 voice recording from these individuals. The main aim of the data is to predict the motor and total UPDRS scores (‘motor_UPDRS’ and ‘total_UPDRS’) from the 16 voice measures [45]. It should be noted here that the Parkinsons dataset is reduced in both size and scale to be appropriate for our analysis and the “motor UPDRS” variable is used as dependent variable.
1.6 PM10
The data are a subsample of 500 observations from a data set that originates in a study where air pollution at a road is related to traffic volume and meteorological variables, collected by the Norwegian Public Roads Administration. The response variable consists of hourly values of the logarithm of the concentration of PM10 (particles), measured at Alnabru in Oslo, Norway, between October 2001 and August 2003 [2].
1.7 Energy load
The data includes hourly electricity load of different regions in Netherlands which is recorded per day between 2008 and 2010. The average electricity load per day is aimed to be predicted by considering the past average values of energy demand. In the analysis, 1730 record is used to predict the daily average energy load.
Rights and permissions
About this article
Cite this article
Koc, E.K., Iyigun, C. Restructuring forward step of MARS algorithm using a new knot selection procedure based on a mapping approach. J Glob Optim 60, 79–102 (2014). https://doi.org/10.1007/s10898-013-0107-5
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10898-013-0107-5