Skip to main content

Advertisement

Log in

Restructuring forward step of MARS algorithm using a new knot selection procedure based on a mapping approach

  • Published:
Journal of Global Optimization Aims and scope Submit manuscript

Abstract

In high dimensional data modeling, Multivariate Adaptive Regression Splines (MARS) is a popular nonparametric regression technique used to define the nonlinear relationship between a response variable and the predictors with the help of splines. MARS uses piecewise linear functions for local fit and apply an adaptive procedure to select the number and location of breaking points (called knots). The function estimation is basically generated via a two-stepwise procedure: forward selection and backward elimination. In the first step, a large number of local fits is obtained by selecting large number of knots via a lack-of-fit criteria; and in the latter one, the least contributing local fits or knots are removed. In conventional adaptive spline procedure, knots are selected from a set of all distinct data points that makes the forward selection procedure computationally expensive and leads to high local variance. To avoid this drawback, it is possible to restrict the knot points to a subset of data points. In this context, a new method is proposed for knot selection which bases on a mapping approach like self organizing maps. By this method, less but more representative data points are become eligible to be used as knots for function estimation in forward step of MARS. The proposed method is applied to many simulated and real datasets, and the results show that it proposes a time efficient forward step for the knot selection and model estimation without degrading the model accuracy and prediction performance.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

Similar content being viewed by others

Notes

  1. Here, for simplicity, the weight vectors associated with BMUs are projected to the nearest original data point. But other projection approaches may also be used.

  2. The code is available on request from the first author.

References

  1. Akaike, H.: Information theory and an extension of the maximum likelihood principle. In: Petrov, B.F, Cs’aki, F. (eds.) 2nd International Symposium on Information Theory. Academiai Kiado, Budapest (1973)

  2. Aldrin, M.: Improved predictions penalizing both slope and curvature in additive models. Computational Statistics and Data Analysis 50, 267–284 (2006)

    Google Scholar 

  3. Atilgan, T.: Basis Selection for Density Estimation and Regression. AT &T Bell Laboratories Technical Memorandum (1988)

  4. Cervellera, C., Chen, V.C.P., Wen, A.: Neural network and regression spline value function approximations for stochastic programming. Comput. Oper. Res. 34, 70–90 (2006)

    Article  Google Scholar 

  5. Chen, V.C.P.: Measuring the goodness of orthogonal array discretization for stochastic programming and stochastic dynamic programming. SIAM J. Optim. 12(2), 322–344 (2001)

    Article  Google Scholar 

  6. Chen, V.C.P., Gunther, D., Johnson, E.L.: Solving for an optimal airline yield management policy via statistical learning. J. R. Stat. Soc. Ser. C 52(1), 1–12 (2003)

    Article  Google Scholar 

  7. Chen, V.C.P., Ruppert, D., Shoemaker, C.A.: Applying experimental design and regression splines to high-dimensional continuous-state stochastic dynamic programming. Oper. Res. 47, 38–53 (1999)

    Article  Google Scholar 

  8. Cortez, P., Cerdeira, A., Almeida, F., Matos, T., Reis, J.: Modeling wine preferences by data mining from physicochemical properties. Decision Support Systems 47(4), 547–553 (2009)

    Google Scholar 

  9. Craven, P., Wahba, G.: Smoothing noisy data with spline functions: estimating the correct degree of smoothing by the method of generalized cross validation. Numerische Mathematik 31, 377–403 (1979)

    Article  Google Scholar 

  10. Crino, S., Brown, D.E.: Global optimization with multivariate adaptive regression splines. IEEE Trans. Syst. Man Cybern. Part B: Cybern. 37(2), 333–340 (2007)

    Article  Google Scholar 

  11. De Boor, C.: A Practical to Guide to Splines. Springer, New York (1978)

    Book  Google Scholar 

  12. Deichmann, J., Eshghi, A., Haughton, D., Sayek, S., Teebagy, N.: Application of multiple adaptive regression splines (MARS) in direct response modeling. J. Direct Market. 16, 15–27 (2002)

    Google Scholar 

  13. Denison, D.G.T., Mallick, B.K., Smith, A.F.M.: Automatic bayesian curve fitting. J. R. Stat. Soc. 60, 333–350 (1998)

    Article  Google Scholar 

  14. Eilers, Paul H.C., Marx, B.D.: Flexible smoothing with B-splines and penalties. Stat. Sci. 11, 98–102 (1996)

    Article  Google Scholar 

  15. Fox, J.: Nonparametric Regression, an R and S-PLUS Companion to Applied Regression. Sage, Thousand Oaks (2002)

    Google Scholar 

  16. Frank, A., Asuncion, A.: UCI Machine Learning Repository. University of California, School of Information and Computer Science, Irvine, CA. http://archive.ics.uci.edu/ml (2010)

  17. Friedman, J.H., Silverman, B.W.: Flexible parsimonious smoothing and additive modeling. Technometrics 31, 3–21 (1989)

    Article  Google Scholar 

  18. Friedman, J.H.: Multivariate adaptive regression splines. Ann. Stat. 19, 1–67 (1991)

    Article  Google Scholar 

  19. Friedman, J.H.: Fast MARS. Stanford University Department of Statistics, Technical Report 110 (1993)

  20. Gersho, A., Gray, R.M.: Vector Quantization and Signal Compression. Kluwer, Norwell (1992)

    Book  Google Scholar 

  21. Gibbons, J.D., Chakraborti, S.: Nonparametric Statistical Inference. Marcel Dekker, New York (2003)

    Google Scholar 

  22. Green, P.H., Silverman, B.W.: Nonparametric Regression and Generalized Linear Models. Chapman Hall, Boca Raton (1994)

    Book  Google Scholar 

  23. Hastie, T.J., Tibshirani, R.J., Friedman, J.: The Elements of Statistical Learning, Data Mining, Inference and Prediction. Springer, New York (2001)

    Google Scholar 

  24. Haykin, S.: Neural Networks: A Comprehensive Foundation. Prentice-Hall, New Jersey (1999)

    Google Scholar 

  25. Jekabsons, G.: ARESLab: Adaptive Regression Splines Toolbox for matlab/Octave. http://www.cs.rtu.lv/jekabsons/ (2011)

  26. Jin, R., Chen, W., Simpson, T.W.: Comparative studies of metamodeling techniques under multiple modeling criteria. Struct. Multidiscip. Optim. 23, 1–13 (2001)

    Article  Google Scholar 

  27. Kohonen, T.: Self-organizing and Associative Memory. Springer, New York (1988)

    Book  Google Scholar 

  28. Kubin, G.: Nonlinear prediction of mobile radio channels: Measurments and mars model designs. In: IEEE Proceedings of the International Conference on Acoustics, Speech, and Signal Processing, March 15–19, pp. 2667–2670 (1999)

  29. Leathwick, J.R., Rowe, D., Richardson, J., Elith, J., Hastie, T.: Using multivariate adaptive regression splines to predict the distributions of New Zealand’s freshwater diadromous fish. Freshw. Biol. 50, 2034–2052 (2005)

    Google Scholar 

  30. Leathwick, J.R., Elith, J., Hastie, T.: Comparative performance of generalized additive models and multivariate adaptive regression splines for statistical modelling of species distributions. Ecol. Model. 199, 188–196 (2006)

    Article  Google Scholar 

  31. Lee, T.S., Chiu, C.C., Chou, Y.C., Lu, C.J.: Mining the customer credit using classifcation and regression tree and multivariate adaptive regression splines. Comput. Stat. Data Anal. 50, 1113–1130 (2006)

    Article  Google Scholar 

  32. Luo, Z., Wahba, G.: Hybrid adaptive splines. J. Am. Stat. Assoc. 92, 107–116 (1997)

    Article  Google Scholar 

  33. Mallows, C.L.: Some comments on Cp. Technometrics 15, 661–675 (1973)

    Google Scholar 

  34. Martinez, N., Martinez, D., Rosenberger, J.M., Chen, V.C.P.: Global Optimization for a Piecewise Linear Regression Spline Function (INFORMS Meeting, 2011)

  35. Miyata, S., Shen, X.: Free-knot splines and adaptive knot selection. J. Jpn. Stat. Soc. 35(2), 303–324 (2005)

    Article  Google Scholar 

  36. Özmen, A.: Robust Conic Quadratic Programming Applied to Quality Improvement—A Robustification of cmars, MSc. Middle East Technical University (2010)

  37. Özmen, A., Weber, G.-W., Batmaz, İ.: The new robust CMARS (RCMARS) method. In: 24th Mini EURO Conference-on Continuous Optimization and Information-Based Technologies in the Financial Sector, MEC EurOPT Selected Papers, ISI Proceedings, pp. 362–368 (2010)

  38. Pilla, V.L., Rosenberger, J.M., Chen, V.C.P., Engsuwan, N., Siddappa, S.: A multivariate adaptive regression splines cutting plane approach for solving a two-stage stochastic programming fleet assignment model. Eur. J. Oper. Res. 216, 162–171 (2012)

    Article  Google Scholar 

  39. Pilla, V., Rosenberger, J., Chen, V.: A statistical computer experiments approach to airline fleet assignment. IIE Trans. 40, 524–537 (2008)

    Article  Google Scholar 

  40. Quinlan, R.: Combining instance-based and model-based learning. In Proceedings on the Tenth International Conference of Machine Learning, University of Massachusetts, Amherst. Morgan Kaufmann. 236–243 (1993)

  41. Sakamoto, W.: MARS: selecting basis functions and knots with an empirical Bayes method. Comput. Stat. 22, 583–597 (2007)

    Article  Google Scholar 

  42. Stone, C.J., Koo, CY.: Additive splines in statistics. In: Proceeding of the Statistical Computing Section, pp. 45–48. American Statistical Association, Alexandria, VA (1985)

  43. Stone, C., Hansen, M., Kooperberg, C., Troung, Y.: Polynomial splines and their tensor products in extended linear modeling. Ann. Stat. 25, 1371–1470 (1997)

    Article  Google Scholar 

  44. Tsai, J.C.C., Chen, V.C.P., Chen, J., Beck, M.B.: Stochastic dynamic programming formulation for a wastewater treatment decision-making framework. Ann. Oper. Res. 132, 207–221 (2004)

    Article  Google Scholar 

  45. Tsanas, A., Little, M.A., McSharry, P.E., Ramig, L.O.: Accurate telemonitoring of Parkinson’s disease progression by non-invasive speech tests. IEEE Transactions on Biomedical Engineering 57(4), 884–894 (2010)

    Google Scholar 

  46. Vesanto, J., Himberg, J., Alhoniehmi, E., Parhankangas, J.: SOM toolbox for Mathlab 5, Report A57. http://www.cis.hut.fi//projects/somtoolbox/ (2000)

  47. Wahba, G.: In discussion to. J. Ramsay: monotone regression splines in action. Stat. Sci. 3, 425–462 (1988)

    Google Scholar 

  48. Weber, G.W., Batmaz, İ., Köksal, G., Taylan, P., Yerlikaya-Özkurt, F.: CMARS: a new contribution to nonparametric regression with multivariate adaptive regression splines supported by continuous optimization. Inverse Probl. Sci. Eng. 20(3), 371–400 (2011)

    Article  Google Scholar 

  49. Wong, C., Kohn, R.: A Bayesian approach to additive semi-parametric regression. J. Econom. 74, 209–235 (1996)

    Article  Google Scholar 

  50. Yeh, I.-C.: Modeling of strength of high performance concrete using artificial neural networks. Cement and Concrete Research 28(12), 1797–1808 (1998)

    Google Scholar 

  51. Yerlikaya, F.: A New Contribution to Nonlinear Robust Regression and Classification with MARS and Its Application to Data Mining for Quality Control in Manufacturing, MSc.Thesis at the Institute of Applied Mathematics of METU, Ankara (2008)

  52. York, T.P., Eaves, L.J., Van den Oord, E.J.C.G.: Multivariate adaptive regression splines: a powerful method for detecting disease-risk relationship differences among subgroubs. Stat. Med. 25(8), 1355–1367 (2006)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Cem Iyigun.

Appendices

Appendix A

1.1 Mathematical formulations for data generation

  • \(P_{1}. f(x) = \sum \limits _{i=1}^7 {\left[ {\left( {\ln \left( {x_i -2} \right) } \right) ^{2}+\left( {\ln \left( {10-x_i } \right) } \right) ^{2}} \right] -\left( {\prod \limits _{i=1}^7 {x_i } } \right) ^{2}} 2.1\le x_i \le 9.9\)

  • \(P_{2}. f(x) = \sum \limits _{j=1}^{10} {\exp ({x_j})\left( {c_j +x_j -\ln \left( {\sum \limits _{k=1}^{10} {\exp \left( {x_k}\right) }} \right) } \right) }\),

  • \(\hbox {cj} = -6.089, -17. 164, -34.054, -5.914, -24.721, -14.986, -24.100, -10.708, -26.662, -22.179\)

  • \(P_{3}. f(x) = x_1^2 +x_2^2 +x_1 x_2 -14x_1 -16x_2 +\left( {x_3 -10} \right) ^{2}-4\left( {x_4 -5} \right) ^{2}+\left( {x_5 -3} \right) ^{2} +2\left( {x_6 -1} \right) ^{2} +5x_7^2 +7\left( {x_8 -11} \right) ^{2}+2\left( {x_9 -10} \right) ^{2}+\left( {x_{10} -7} \right) ^{2}+45\)

  • \(P_{4}. f(x) = \sin \left( {\pi x_1 /12} \right) \cos \left( {\pi x_2 /16} \right) \)

  • \(P_{5}. f(x) = \left( {x_1 -1} \right) ^{2}+\left( {x_1 -x_2 } \right) ^{2}+\left( {x_2 -x_3 } \right) ^{4}\)

  • \(P_{6}. f(x) = 5.3578547x_2^2 +0.8356891x_1 x_3 +37.293239x_1 -40792.141\)

  • \(P_{8}. f(x) = 3(1-x)^{2}\exp \left( {-x^{2}-(y+1)^{2}} \right) -10\left( {x/5-x^{3}-y^{5}} \right) \exp \left( {-x^{2}-y^{2}} \right) -1/{3\exp \left( {-(x+1)^{2}-y^{2}} \right) +2x}\)

Appendix B

1.1 Grid plots of mathematical functions

figure a

Appendix C

1.1 Root mean squared error (RMSE)

RMSE indicates the grossly inaccurate estimates. The smaller RMSE, the better it is.

$$\begin{aligned} RMSE=\sqrt{\frac{1}{n}\sum {\left( {y_i -\hat{{y}}_i } \right) ^{2}}}. \end{aligned}$$

where, \(n\) is the number of observation, \(y_i \) is an \(i\)th observed response value and \(\hat{{y}}_i \) is the \(i\)th fitted response.

1.2 Adjusted-multiple coefficient of determination (Adj-R\(^{2})\):

The value of coefficient of determination (\(\hbox {R}^{2})\) indicates how much variation in response is explained by the model. \(\hbox {Adj-R}^{2}\) is penalized form of \(\hbox {R}^{2}\) with respect to the number of predictors in model. The higher the \(\hbox {Adj-R}^{2}\), the better the model fits the data.

$$\begin{aligned} Adj-R^{2}=1-\left( \frac{\sum {\left( {y_i -\hat{{y}}_i} \right) ^{2}}}{\sum {\left( {y_i -\bar{{y}}_i } \right) ^{2}}} \right) \left( {\frac{n-1}{n-p-1}} \right) , \end{aligned}$$

where \(\left( {n-p-1} \right) \ne 0;\, p\) is the number of terms in the model, \(y_i \) is an \(i\)th observed response value; \(\bar{{y}}_i \) is the mean response and \(\hat{{y}}_i \) is the \(i\)th fitted response.

1.3 Stability

$$\begin{aligned} \min \left\{ {\frac{MR_{TR} }{MR_{TE} },\frac{MR_{TE} }{MR_{TR} }} \right\} , \end{aligned}$$

where \(MR_{TR} \) and \(MR_{TE} \) represents the performance measures (RMSE or \(\hbox {Adj-R}^{2})\) for the test and training data sets, respectively. The model whose stability measure is close to one represents a stable model.

1.4 Generalized cross validation (GCV)

$$\begin{aligned} GCV(M)=\frac{1}{n}\frac{\sum _{i=1}^n {(y_i -\hat{{f}}_M (\mathbf{x}_i ))^{2}} }{(1-{P(M)}/n)^{2}}, \end{aligned}$$

where, \(y_i \) is the \(i\)th observed response value; \(\hat{{f}}_M (\mathbf{x}_i )\) is the fitted response value obtained for the \(i\)th observed predictor vector \(\mathbf{x}_i =(x_{i,1},\ldots ,x_{i,p} )^{T}\;(i=1,\ldots ,n), \quad n\) is the number of data points; \(M\) represents the maximum number of BFs in the model.

The value \(P(M)\) is the effective number of parameters which is a penalty measure for complexity. Here, \(P(M)=r+dN,\) where \(r\) is the number of linearly independent BFs in the model, and \(N\) is the number of knots selected in the forward process. Note that if the model is additive then \(d\) is taken to be two; if the model is an interaction model then \(d= 3\).

Appendix D

1.1 Red Wine quality

The dataset is related to red type of the Portuguese “Vinho Verde” wine [8]. The inputs include objective tests (e.g. PH values) and the output is based on sensory data (median of at least 3 evaluations made by wine experts). Input variables (based on physicochemical tests) are fixed acidity, volatile acidity, citric acid, residual sugar, chlorides, free sulfur dioxide, total sulfur dioxide, density, pH, sulphates, and alcohol. There is no missing attribute value in data.

1.2 Concrete compressive strength data set

Concrete is the most important material in civil engineering. The concrete compressive strength is a highly nonlinear function of age and ingredients. These ingredients include cement, blast furnace slag, fly ash, water, super plasticizer, coarse aggregate, and fine aggregate [50]. There is no missing attribute value in data.

1.3 Communities and crime

The data combines socio-economic data from the 1990 US Census, law enforcement data from the 1990 US LEMAS survey, and crime data from the 1995 FBI UCR. The variables included in the dataset involve the community, such as the percent of the population considered urban, and the median family income, and involving law enforcement, such as per capita number of police officers, and percent of officers assigned to drug units. The per capita violent crimes variable was calculated using population and the sum of crime variables considered violent crimes in the United States: murder, rape, robbery, and assault. It should be noted here that the Communities and Crime dataset is reduced in both size and scale to be appropriate for our analysis.

1.4 Auto-Mpg data

The data concerns city-cycle fuel consumption in miles per gallon to be predicted in terms of 3 multi-valued discrete and 5 continuous attributes including cylinders, displacement, horsepower, weight, acceleration, model year and origin [40]. Data has 6 missing values.

1.5 Parkinsons telemonitoring

This dataset is composed of a range of biomedical voice measurements from 42 people with early-stage Parkinson’s disease recruited to a six-month trial of a telemonitoring device for remote symptom progression monitoring. The recordings were automatically captured in the patient’s homes. Columns in the table contain subject number, subject age, subject gender, time interval from baseline recruitment date, motor UPDRS, total UPDRS, and 16 biomedical voice measures. Each row corresponds to one of 5,875 voice recording from these individuals. The main aim of the data is to predict the motor and total UPDRS scores (‘motor_UPDRS’ and ‘total_UPDRS’) from the 16 voice measures [45]. It should be noted here that the Parkinsons dataset is reduced in both size and scale to be appropriate for our analysis and the “motor UPDRS” variable is used as dependent variable.

1.6 PM10

The data are a subsample of 500 observations from a data set that originates in a study where air pollution at a road is related to traffic volume and meteorological variables, collected by the Norwegian Public Roads Administration. The response variable consists of hourly values of the logarithm of the concentration of PM10 (particles), measured at Alnabru in Oslo, Norway, between October 2001 and August 2003 [2].

1.7 Energy load

The data includes hourly electricity load of different regions in Netherlands which is recorded per day between 2008 and 2010. The average electricity load per day is aimed to be predicted by considering the past average values of energy demand. In the analysis, 1730 record is used to predict the daily average energy load.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Koc, E.K., Iyigun, C. Restructuring forward step of MARS algorithm using a new knot selection procedure based on a mapping approach. J Glob Optim 60, 79–102 (2014). https://doi.org/10.1007/s10898-013-0107-5

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10898-013-0107-5

Keywords