Restructuring forward step of MARS algorithm using a new knot selection procedure based on a mapping approach

Koc, Elcin Kartal; Iyigun, Cem

doi:10.1007/s10898-013-0107-5

Restructuring forward step of MARS algorithm using a new knot selection procedure based on a mapping approach

Published: 26 September 2013

Volume 60, pages 79–102, (2014)
Cite this article

Journal of Global Optimization Aims and scope Submit manuscript

Elcin Kartal Koc¹ &
Cem Iyigun²

444 Accesses
Explore all metrics

Abstract

In high dimensional data modeling, Multivariate Adaptive Regression Splines (MARS) is a popular nonparametric regression technique used to define the nonlinear relationship between a response variable and the predictors with the help of splines. MARS uses piecewise linear functions for local fit and apply an adaptive procedure to select the number and location of breaking points (called knots). The function estimation is basically generated via a two-stepwise procedure: forward selection and backward elimination. In the first step, a large number of local fits is obtained by selecting large number of knots via a lack-of-fit criteria; and in the latter one, the least contributing local fits or knots are removed. In conventional adaptive spline procedure, knots are selected from a set of all distinct data points that makes the forward selection procedure computationally expensive and leads to high local variance. To avoid this drawback, it is possible to restrict the knot points to a subset of data points. In this context, a new method is proposed for knot selection which bases on a mapping approach like self organizing maps. By this method, less but more representative data points are become eligible to be used as knots for function estimation in forward step of MARS. The proposed method is applied to many simulated and real datasets, and the results show that it proposes a time efficient forward step for the knot selection and model estimation without degrading the model accuracy and prediction performance.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

CG-Lasso Estimator for Multivariate Adaptive Regression Spline

Roughness regularization for functional data analysis with free knots spline estimation

Article Open access 08 August 2024

RCV-based error density estimation in the ultrahigh dimensional additive model

Article 27 January 2022

Notes

Here, for simplicity, the weight vectors associated with BMUs are projected to the nearest original data point. But other projection approaches may also be used.
The code is available on request from the first author.

References

Akaike, H.: Information theory and an extension of the maximum likelihood principle. In: Petrov, B.F, Cs’aki, F. (eds.) 2nd International Symposium on Information Theory. Academiai Kiado, Budapest (1973)
Aldrin, M.: Improved predictions penalizing both slope and curvature in additive models. Computational Statistics and Data Analysis 50, 267–284 (2006)
Google Scholar
Atilgan, T.: Basis Selection for Density Estimation and Regression. AT &T Bell Laboratories Technical Memorandum (1988)
Cervellera, C., Chen, V.C.P., Wen, A.: Neural network and regression spline value function approximations for stochastic programming. Comput. Oper. Res. 34, 70–90 (2006)
Article Google Scholar
Chen, V.C.P.: Measuring the goodness of orthogonal array discretization for stochastic programming and stochastic dynamic programming. SIAM J. Optim. 12(2), 322–344 (2001)
Article Google Scholar
Chen, V.C.P., Gunther, D., Johnson, E.L.: Solving for an optimal airline yield management policy via statistical learning. J. R. Stat. Soc. Ser. C 52(1), 1–12 (2003)
Article Google Scholar
Chen, V.C.P., Ruppert, D., Shoemaker, C.A.: Applying experimental design and regression splines to high-dimensional continuous-state stochastic dynamic programming. Oper. Res. 47, 38–53 (1999)
Article Google Scholar
Cortez, P., Cerdeira, A., Almeida, F., Matos, T., Reis, J.: Modeling wine preferences by data mining from physicochemical properties. Decision Support Systems 47(4), 547–553 (2009)
Google Scholar
Craven, P., Wahba, G.: Smoothing noisy data with spline functions: estimating the correct degree of smoothing by the method of generalized cross validation. Numerische Mathematik 31, 377–403 (1979)
Article Google Scholar
Crino, S., Brown, D.E.: Global optimization with multivariate adaptive regression splines. IEEE Trans. Syst. Man Cybern. Part B: Cybern. 37(2), 333–340 (2007)
Article Google Scholar
De Boor, C.: A Practical to Guide to Splines. Springer, New York (1978)
Book Google Scholar
Deichmann, J., Eshghi, A., Haughton, D., Sayek, S., Teebagy, N.: Application of multiple adaptive regression splines (MARS) in direct response modeling. J. Direct Market. 16, 15–27 (2002)
Google Scholar
Denison, D.G.T., Mallick, B.K., Smith, A.F.M.: Automatic bayesian curve fitting. J. R. Stat. Soc. 60, 333–350 (1998)
Article Google Scholar
Eilers, Paul H.C., Marx, B.D.: Flexible smoothing with B-splines and penalties. Stat. Sci. 11, 98–102 (1996)
Article Google Scholar
Fox, J.: Nonparametric Regression, an R and S-PLUS Companion to Applied Regression. Sage, Thousand Oaks (2002)
Google Scholar
Frank, A., Asuncion, A.: UCI Machine Learning Repository. University of California, School of Information and Computer Science, Irvine, CA. http://archive.ics.uci.edu/ml (2010)
Friedman, J.H., Silverman, B.W.: Flexible parsimonious smoothing and additive modeling. Technometrics 31, 3–21 (1989)
Article Google Scholar
Friedman, J.H.: Multivariate adaptive regression splines. Ann. Stat. 19, 1–67 (1991)
Article Google Scholar
Friedman, J.H.: Fast MARS. Stanford University Department of Statistics, Technical Report 110 (1993)
Gersho, A., Gray, R.M.: Vector Quantization and Signal Compression. Kluwer, Norwell (1992)
Book Google Scholar
Gibbons, J.D., Chakraborti, S.: Nonparametric Statistical Inference. Marcel Dekker, New York (2003)
Google Scholar
Green, P.H., Silverman, B.W.: Nonparametric Regression and Generalized Linear Models. Chapman Hall, Boca Raton (1994)
Book Google Scholar
Hastie, T.J., Tibshirani, R.J., Friedman, J.: The Elements of Statistical Learning, Data Mining, Inference and Prediction. Springer, New York (2001)
Google Scholar
Haykin, S.: Neural Networks: A Comprehensive Foundation. Prentice-Hall, New Jersey (1999)
Google Scholar
Jekabsons, G.: ARESLab: Adaptive Regression Splines Toolbox for matlab/Octave. http://www.cs.rtu.lv/jekabsons/ (2011)
Jin, R., Chen, W., Simpson, T.W.: Comparative studies of metamodeling techniques under multiple modeling criteria. Struct. Multidiscip. Optim. 23, 1–13 (2001)
Article Google Scholar
Kohonen, T.: Self-organizing and Associative Memory. Springer, New York (1988)
Book Google Scholar
Kubin, G.: Nonlinear prediction of mobile radio channels: Measurments and mars model designs. In: IEEE Proceedings of the International Conference on Acoustics, Speech, and Signal Processing, March 15–19, pp. 2667–2670 (1999)
Leathwick, J.R., Rowe, D., Richardson, J., Elith, J., Hastie, T.: Using multivariate adaptive regression splines to predict the distributions of New Zealand’s freshwater diadromous fish. Freshw. Biol. 50, 2034–2052 (2005)
Google Scholar
Leathwick, J.R., Elith, J., Hastie, T.: Comparative performance of generalized additive models and multivariate adaptive regression splines for statistical modelling of species distributions. Ecol. Model. 199, 188–196 (2006)
Article Google Scholar
Lee, T.S., Chiu, C.C., Chou, Y.C., Lu, C.J.: Mining the customer credit using classifcation and regression tree and multivariate adaptive regression splines. Comput. Stat. Data Anal. 50, 1113–1130 (2006)
Article Google Scholar
Luo, Z., Wahba, G.: Hybrid adaptive splines. J. Am. Stat. Assoc. 92, 107–116 (1997)
Article Google Scholar
Mallows, C.L.: Some comments on Cp. Technometrics 15, 661–675 (1973)
Google Scholar
Martinez, N., Martinez, D., Rosenberger, J.M., Chen, V.C.P.: Global Optimization for a Piecewise Linear Regression Spline Function (INFORMS Meeting, 2011)
Miyata, S., Shen, X.: Free-knot splines and adaptive knot selection. J. Jpn. Stat. Soc. 35(2), 303–324 (2005)
Article Google Scholar
Özmen, A.: Robust Conic Quadratic Programming Applied to Quality Improvement—A Robustification of cmars, MSc. Middle East Technical University (2010)
Özmen, A., Weber, G.-W., Batmaz, İ.: The new robust CMARS (RCMARS) method. In: 24th Mini EURO Conference-on Continuous Optimization and Information-Based Technologies in the Financial Sector, MEC EurOPT Selected Papers, ISI Proceedings, pp. 362–368 (2010)
Pilla, V.L., Rosenberger, J.M., Chen, V.C.P., Engsuwan, N., Siddappa, S.: A multivariate adaptive regression splines cutting plane approach for solving a two-stage stochastic programming fleet assignment model. Eur. J. Oper. Res. 216, 162–171 (2012)
Article Google Scholar
Pilla, V., Rosenberger, J., Chen, V.: A statistical computer experiments approach to airline fleet assignment. IIE Trans. 40, 524–537 (2008)
Article Google Scholar
Quinlan, R.: Combining instance-based and model-based learning. In Proceedings on the Tenth International Conference of Machine Learning, University of Massachusetts, Amherst. Morgan Kaufmann. 236–243 (1993)
Sakamoto, W.: MARS: selecting basis functions and knots with an empirical Bayes method. Comput. Stat. 22, 583–597 (2007)
Article Google Scholar
Stone, C.J., Koo, CY.: Additive splines in statistics. In: Proceeding of the Statistical Computing Section, pp. 45–48. American Statistical Association, Alexandria, VA (1985)
Stone, C., Hansen, M., Kooperberg, C., Troung, Y.: Polynomial splines and their tensor products in extended linear modeling. Ann. Stat. 25, 1371–1470 (1997)
Article Google Scholar
Tsai, J.C.C., Chen, V.C.P., Chen, J., Beck, M.B.: Stochastic dynamic programming formulation for a wastewater treatment decision-making framework. Ann. Oper. Res. 132, 207–221 (2004)
Article Google Scholar
Tsanas, A., Little, M.A., McSharry, P.E., Ramig, L.O.: Accurate telemonitoring of Parkinson’s disease progression by non-invasive speech tests. IEEE Transactions on Biomedical Engineering 57(4), 884–894 (2010)
Google Scholar
Vesanto, J., Himberg, J., Alhoniehmi, E., Parhankangas, J.: SOM toolbox for Mathlab 5, Report A57. http://www.cis.hut.fi//projects/somtoolbox/ (2000)
Wahba, G.: In discussion to. J. Ramsay: monotone regression splines in action. Stat. Sci. 3, 425–462 (1988)
Google Scholar
Weber, G.W., Batmaz, İ., Köksal, G., Taylan, P., Yerlikaya-Özkurt, F.: CMARS: a new contribution to nonparametric regression with multivariate adaptive regression splines supported by continuous optimization. Inverse Probl. Sci. Eng. 20(3), 371–400 (2011)
Article Google Scholar
Wong, C., Kohn, R.: A Bayesian approach to additive semi-parametric regression. J. Econom. 74, 209–235 (1996)
Article Google Scholar
Yeh, I.-C.: Modeling of strength of high performance concrete using artificial neural networks. Cement and Concrete Research 28(12), 1797–1808 (1998)
Google Scholar
Yerlikaya, F.: A New Contribution to Nonlinear Robust Regression and Classification with MARS and Its Application to Data Mining for Quality Control in Manufacturing, MSc.Thesis at the Institute of Applied Mathematics of METU, Ankara (2008)
York, T.P., Eaves, L.J., Van den Oord, E.J.C.G.: Multivariate adaptive regression splines: a powerful method for detecting disease-risk relationship differences among subgroubs. Stat. Med. 25(8), 1355–1367 (2006)
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Statistics, Middle East Technical University, Ankara, Turkey
Elcin Kartal Koc
Department of Industrial Engineering, Middle East Technical University, Ankara, Turkey
Cem Iyigun

Authors

Elcin Kartal Koc
View author publications
You can also search for this author inPubMed Google Scholar
Cem Iyigun
View author publications
You can also search for this author inPubMed Google Scholar

Corresponding author

Correspondence to Cem Iyigun.

Appendices

Appendix A

1.1 Mathematical formulations for data generation

$P_{1}. f(x) = \sum \limits _{i=1}^7 {\left[ {\left( {\ln \left( {x_i -2} \right) } \right) ^{2}+\left( {\ln \left( {10-x_i } \right) } \right) ^{2}} \right] -\left( {\prod \limits _{i=1}^7 {x_i } } \right) ^{2}} 2.1\le x_i \le 9.9$
$P_{2}. f(x) = \sum \limits _{j=1}^{10} {\exp ({x_j})\left( {c_j +x_j -\ln \left( {\sum \limits _{k=1}^{10} {\exp \left( {x_k}\right) }} \right) } \right) }$,
$\hbox {cj} = -6.089, -17. 164, -34.054, -5.914, -24.721, -14.986, -24.100, -10.708, -26.662, -22.179$
$P_{3}. f(x) = x_1^2 +x_2^2 +x_1 x_2 -14x_1 -16x_2 +\left( {x_3 -10} \right) ^{2}-4\left( {x_4 -5} \right) ^{2}+\left( {x_5 -3} \right) ^{2} +2\left( {x_6 -1} \right) ^{2} +5x_7^2 +7\left( {x_8 -11} \right) ^{2}+2\left( {x_9 -10} \right) ^{2}+\left( {x_{10} -7} \right) ^{2}+45$
$P_{4}. f(x) = \sin \left( {\pi x_1 /12} \right) \cos \left( {\pi x_2 /16} \right) $
$P_{5}. f(x) = \left( {x_1 -1} \right) ^{2}+\left( {x_1 -x_2 } \right) ^{2}+\left( {x_2 -x_3 } \right) ^{4}$
$P_{6}. f(x) = 5.3578547x_2^2 +0.8356891x_1 x_3 +37.293239x_1 -40792.141$
$P_{8}. f(x) = 3(1-x)^{2}\exp \left( {-x^{2}-(y+1)^{2}} \right) -10\left( {x/5-x^{3}-y^{5}} \right) \exp \left( {-x^{2}-y^{2}} \right) -1/{3\exp \left( {-(x+1)^{2}-y^{2}} \right) +2x}$

Appendix B

1.1 Grid plots of mathematical functions

Appendix C

1.1 Root mean squared error (RMSE)

RMSE indicates the grossly inaccurate estimates. The smaller RMSE, the better it is.

$$\begin{aligned} RMSE=\sqrt{\frac{1}{n}\sum {\left( {y_i -\hat{{y}}_i } \right) ^{2}}}. \end{aligned}$$

where, $n$ is the number of observation, $y_i $ is an $i$th observed response value and $\hat{{y}}_i $ is the $i$th fitted response.

1.2 Adjusted-multiple coefficient of determination (Adj-R$^{2})$:

The value of coefficient of determination ($\hbox {R}^{2})$ indicates how much variation in response is explained by the model. $\hbox {Adj-R}^{2}$ is penalized form of $\hbox {R}^{2}$ with respect to the number of predictors in model. The higher the $\hbox {Adj-R}^{2}$, the better the model fits the data.

$$\begin{aligned} Adj-R^{2}=1-\left( \frac{\sum {\left( {y_i -\hat{{y}}_i} \right) ^{2}}}{\sum {\left( {y_i -\bar{{y}}_i } \right) ^{2}}} \right) \left( {\frac{n-1}{n-p-1}} \right) , \end{aligned}$$

where $\left( {n-p-1} \right) \ne 0;\, p$ is the number of terms in the model, $y_i $ is an $i$th observed response value; $\bar{{y}}_i $ is the mean response and $\hat{{y}}_i $ is the $i$th fitted response.

1.3 Stability

$$\begin{aligned} \min \left\{ {\frac{MR_{TR} }{MR_{TE} },\frac{MR_{TE} }{MR_{TR} }} \right\} , \end{aligned}$$

where $MR_{TR} $ and $MR_{TE} $ represents the performance measures (RMSE or $\hbox {Adj-R}^{2})$ for the test and training data sets, respectively. The model whose stability measure is close to one represents a stable model.

1.4 Generalized cross validation (GCV)

$$\begin{aligned} GCV(M)=\frac{1}{n}\frac{\sum _{i=1}^n {(y_i -\hat{{f}}_M (\mathbf{x}_i ))^{2}} }{(1-{P(M)}/n)^{2}}, \end{aligned}$$

where, $y_i $ is the $i$th observed response value; $\hat{{f}}_M (\mathbf{x}_i )$ is the fitted response value obtained for the $i$th observed predictor vector $\mathbf{x}_i =(x_{i,1},\ldots ,x_{i,p} )^{T}\;(i=1,\ldots ,n), \quad n$ is the number of data points; $M$ represents the maximum number of BFs in the model.

The value $P(M)$ is the effective number of parameters which is a penalty measure for complexity. Here, $P(M)=r+dN,$ where $r$ is the number of linearly independent BFs in the model, and $N$ is the number of knots selected in the forward process. Note that if the model is additive then $d$ is taken to be two; if the model is an interaction model then $d= 3$.

Appendix D

1.1 Red Wine quality

The dataset is related to red type of the Portuguese “Vinho Verde” wine [8]. The inputs include objective tests (e.g. PH values) and the output is based on sensory data (median of at least 3 evaluations made by wine experts). Input variables (based on physicochemical tests) are fixed acidity, volatile acidity, citric acid, residual sugar, chlorides, free sulfur dioxide, total sulfur dioxide, density, pH, sulphates, and alcohol. There is no missing attribute value in data.

1.2 Concrete compressive strength data set

Concrete is the most important material in civil engineering. The concrete compressive strength is a highly nonlinear function of age and ingredients. These ingredients include cement, blast furnace slag, fly ash, water, super plasticizer, coarse aggregate, and fine aggregate [50]. There is no missing attribute value in data.

1.3 Communities and crime

The data combines socio-economic data from the 1990 US Census, law enforcement data from the 1990 US LEMAS survey, and crime data from the 1995 FBI UCR. The variables included in the dataset involve the community, such as the percent of the population considered urban, and the median family income, and involving law enforcement, such as per capita number of police officers, and percent of officers assigned to drug units. The per capita violent crimes variable was calculated using population and the sum of crime variables considered violent crimes in the United States: murder, rape, robbery, and assault. It should be noted here that the Communities and Crime dataset is reduced in both size and scale to be appropriate for our analysis.

1.4 Auto-Mpg data

The data concerns city-cycle fuel consumption in miles per gallon to be predicted in terms of 3 multi-valued discrete and 5 continuous attributes including cylinders, displacement, horsepower, weight, acceleration, model year and origin [40]. Data has 6 missing values.

1.5 Parkinsons telemonitoring

This dataset is composed of a range of biomedical voice measurements from 42 people with early-stage Parkinson’s disease recruited to a six-month trial of a telemonitoring device for remote symptom progression monitoring. The recordings were automatically captured in the patient’s homes. Columns in the table contain subject number, subject age, subject gender, time interval from baseline recruitment date, motor UPDRS, total UPDRS, and 16 biomedical voice measures. Each row corresponds to one of 5,875 voice recording from these individuals. The main aim of the data is to predict the motor and total UPDRS scores (‘motor_UPDRS’ and ‘total_UPDRS’) from the 16 voice measures [45]. It should be noted here that the Parkinsons dataset is reduced in both size and scale to be appropriate for our analysis and the “motor UPDRS” variable is used as dependent variable.

1.6 PM10

The data are a subsample of 500 observations from a data set that originates in a study where air pollution at a road is related to traffic volume and meteorological variables, collected by the Norwegian Public Roads Administration. The response variable consists of hourly values of the logarithm of the concentration of PM10 (particles), measured at Alnabru in Oslo, Norway, between October 2001 and August 2003 [2].

1.7 Energy load

The data includes hourly electricity load of different regions in Netherlands which is recorded per day between 2008 and 2010. The average electricity load per day is aimed to be predicted by considering the past average values of energy demand. In the analysis, 1730 record is used to predict the daily average energy load.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Koc, E.K., Iyigun, C. Restructuring forward step of MARS algorithm using a new knot selection procedure based on a mapping approach. J Glob Optim 60, 79–102 (2014). https://doi.org/10.1007/s10898-013-0107-5

Download citation

Received: 20 December 2012
Accepted: 31 August 2013
Published: 26 September 2013
Issue Date: September 2014
DOI: https://doi.org/10.1007/s10898-013-0107-5

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Restructuring forward step of MARS algorithm using a new knot selection procedure based on a mapping approach

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

CG-Lasso Estimator for Multivariate Adaptive Regression Spline

Roughness regularization for functional data analysis with free knots spline estimation

RCV-based error density estimation in the ultrahigh dimensional additive model

Notes

References

Author information

Authors and Affiliations

Corresponding author

Appendices

Appendix A

1.1 Mathematical formulations for data generation

Appendix B

1.1 Grid plots of mathematical functions

Appendix C

1.1 Root mean squared error (RMSE)

1.2 Adjusted-multiple coefficient of determination (Adj-R\(^{2})\):

1.3 Stability

1.4 Generalized cross validation (GCV)

Appendix D

1.1 Red Wine quality

1.2 Concrete compressive strength data set

1.3 Communities and crime

1.4 Auto-Mpg data

1.5 Parkinsons telemonitoring

1.6 PM10

1.7 Energy load

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Restructuring forward step of MARS algorithm using a new knot selection procedure based on a mapping approach

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

CG-Lasso Estimator for Multivariate Adaptive Regression Spline

Roughness regularization for functional data analysis with free knots spline estimation

RCV-based error density estimation in the ultrahigh dimensional additive model

Notes

References

Author information

Authors and Affiliations

Corresponding author

Appendices

Appendix A

1.1 Mathematical formulations for data generation

Appendix B

1.1 Grid plots of mathematical functions

Appendix C

1.1 Root mean squared error (RMSE)

1.2 Adjusted-multiple coefficient of determination (Adj-R\(^{2})\):

1.3 Stability

1.4 Generalized cross validation (GCV)

Appendix D

1.1 Red Wine quality

1.2 Concrete compressive strength data set

1.3 Communities and crime

1.4 Auto-Mpg data

1.5 Parkinsons telemonitoring

1.6 PM10

1.7 Energy load

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now