Efficient computer experiment-based optimization through variable selection

Shih, Dachuan T.; Kim, Seoung Bum; Chen, Victoria C. P.; Rosenberger, Jay M.; Pilla, Venkata L.

doi:10.1007/s10479-012-1129-y

Efficient computer experiment-based optimization through variable selection

Published: 18 April 2012

Volume 216, pages 287–305, (2014)
Cite this article

Annals of Operations Research Aims and scope Submit manuscript

Dachuan T. Shih¹,
Seoung Bum Kim²,
Victoria C. P. Chen³,
Jay M. Rosenberger³ &
…
Venkata L. Pilla⁴

407 Accesses
1 Altmetric
Explore all metrics

Abstract

A computer experiment-based optimization approach employs design of experiments and statistical modeling to represent a complex objective function that can only be evaluated pointwise by running a computer model. In large-scale applications, the number of variables is huge, and direct use of computer experiments would require an exceedingly large experimental design and, consequently, significant computational effort. If a large portion of the variables have little impact on the objective, then there is a need to eliminate these before performing the complete set of computer experiments. This is a variable selection task. The ideal variable selection method for this task should handle unknown nonlinear structure, should be computationally fast, and would be conducted after a small number of computer experiment runs, likely fewer runs (N) than the number of variables (P). Conventional variable selection techniques are based on assumed linear model forms and cannot be applied in this “large P and small N” problem. In this paper, we present a framework that adds a variable selection step prior to computer experiment-based optimization, and we consider data mining methods, using principal components analysis and multiple testing based on false discovery rate, that are appropriate for our variable selection task. An airline fleet assignment case study is used to illustrate our approach.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A General Mathematical Framework for Constrained Mixed-variable Blackbox Optimization Problems with Meta and Categorical Variables

Article 23 February 2023

A Two-Phase Approach for Model-Based Design of Experiments Applied in Chemical Engineering

A Study of New Variable Selection Method Within a Framework of Real-Coded Genetic Algorithm

References

Bellman, R. E. (1957). Dynamic programming. Princeton: Princeton University Press.
Google Scholar
Benjamini, Y., & Hochberg, Y. (1995). Controlling the false discovery rate: a practical and powerful approach to multiple testing. Journal of the Royal Statistical Society. Series B, 57, 289–300.
Google Scholar
Benjamini, Y., & Yekutieli, D. (2001). The control of the false discovery rate in multiple testing under dependency. Annals of Statistics, 29, 1165–1188.
Article Google Scholar
Berge, M. E., & Hopperstad, C. A. (1993). Demand driven dispatch: a method of dynamic aircraft capacity assignment, models and algorithms. Operations Research, 41(1), 153–168.
Article Google Scholar
Birge, J. R., & Louveaux, F. (1997). Introduction to stochastic programming. New York: Springer.
Google Scholar
Cervellera, C., Chen, V. C. P., & Wen, A. (2006). Optimization of a large-scale water reservoir network by stochastic dynamic programming with efficient state space discretization. European Journal of Operational Research, 171, 1139–1151.
Article Google Scholar
Chen, V. C. P. (1999). Application of MARS and orthogonal arrays to inventory forecasting stochastic dynamic programs. Computational Statistics and Data Analysis, 30, 317–341.
Article Google Scholar
Chen, V. C. P., Ruppert, D., & Shoemaker, C. A. (1999). Applying experimental design and regression splines to high-dimensional continuous-state stochastic dynamic programming. Operations Research, 47, 38–53.
Article Google Scholar
Chen, V. C. P., Günther, D., & Johnson, E. L. (2003). Solving for an optimal airline yield management policy via statistical learning. Journal of the Royal Statistical Society. Series C, 52(1), 1–12.
Article Google Scholar
Chen, V. C. P., Tsui, K.-L., Barton, R. R., & Meckesheimer, M. (2006). Design, modeling, and applications of computer experiments. IIE Transactions, 38, 273–291.
Article Google Scholar
Efron, B. (2004). Large-scale simultaneous hypothesis testing: the choice of a null hypothesis. Journal of the American Statistical Association, 99, 99–104.
Google Scholar
Elomaa, T., & Rousu, J. (2002). Fast minimum training error discretization. In Proceedings of the ninetheenth international conference on machine learning, Sydney, Australia (p. 131–138).
Google Scholar
Fayyad, U. M., & Irani, K. B. (1992). On the handling of continuous-valued attributes in decision tree generation. Machine Learning, 8(1), 82–102.
Google Scholar
Friedman, J. H. (1991). Multivariate adaptive regression splines (with discussion). Annals of Statistics, 19, 1–141.
Article Google Scholar
Gopalakrishnan, B., & Johnson, E. L. (2005). Airline crew scheduling: state-of-the-art. Annals of Operations Research, 140, 305–337.
Article Google Scholar
Guyon, I., & Elisseeff, A. (2003). An introduction to variable and feature selection. Journal of Machine Learning Research, 3, 1157–1182.
Google Scholar
Jain, A. K., Duin, R., & Mao, J. (2000). Statistical pattern recognition: a review. IEEE Transactions on Pattern Analysis and Machine Intelligence, 20, 4–37.
Article Google Scholar
Jolliffe, I. T. (2002). Principal components analysis. New York: Springer.
Google Scholar
Kim, S. B., Tsui, K. L., & Borodovsky, M. (2006). Multiple hypothesis testing in large-scale contingency tables: inferring patterns of pair-wise amino acid association in β-sheets. International Journal of Bioinformatics Research and Applications, 2, 193–217.
Google Scholar
Kim, S. B., Wang, Z., Oraintara, S., Temiyasathit, C., & Wongsawat, Y. (2008). Feature selection and classification of high-resolution NMR spectra in the complex wavelet transform domain. Chemometrics and Intelligent Laboratory Systems, 90(2), 161–168.
Article Google Scholar
Kleijnen, J. P. C. (2005). An overview of the design and analysis of simulation experiments for sensitivity analysis. European Journal of Operational Research, 164(2), 287–300.
Article Google Scholar
McGill, J., & van Ryzin, G. J. (1999). Revenue management: research overview and prospects. Transportation Science, 33, 233–256.
Article Google Scholar
Mitchell, T. M. (1997). Machine learning. New York: McGraw-Hill.
Google Scholar
Pilla, V. L. (2006). Robust airline fleet assignment. PhD thesis, University of Texas at Arlington.
Pilla, V. L., Rosenberger, J. M., Chen, V. C. P., & Smith, B. (2008). A statistical computer experiments approach to airline fleet assignment. IIE Transactions, 40, 524–537.
Article Google Scholar
Pilla, V. L., Rosenberger, J. M., Chen, V. C. P., Engsuwan, N., & Siddappa, S. (2012). A multivariate adaptive regression splines cutting plane approach for solving a two-stage stochastic programming fleet assignment model. European Journal of Operational Research, 216, 162–171.
Article Google Scholar
Powell, W. B. (2007). Approximate dynamic programming: solving the curses of dimensionality. Hoboken: Wiley.
Book Google Scholar
Sacks, J., Welch, W. J., Mitchell, T. J., & Wynn, H. P. (1989). Design and analysis of computer experiments (with discussion). Statistical Science, 4, 409–423.
Article Google Scholar
Sherali, H. D., Bish, E. K., & Zhu, X. (2006). Airline fleet assignment concepts, models, and algorithms. European Journal of Operational Research, 172, 1–30.
Article Google Scholar
Sherali, H. D., & Zhu, X. (2008). Two-stage fleet assignment model considering stochastic passenger demands. Operations Research, 56(2), 383–399.
Article Google Scholar
Shih, D. T., Chen, V. C. P., & Kim, S. B. (2006). Convex version of multivariate adaptive regression splines. In Proceedings of the 2006 industrial engineering research conference, Orlando, FL, USA.
Storey, J. D., & Tibshirani, R. (2003). Statistical significance for genomewide studies. Proceedings of the National Academy of Sciences of the United States of America, 100, 9440–9445.
Article Google Scholar
Temiyasathit, C., Kim, S. B., & Park, S. K. (2009). Spatial prediction of ozone concentration profiles. Computational Statistics & Data Analysis, 53, 3892–3906.
Article Google Scholar
Tsai, J. C. C., & Chen, V. C. P. (2005). Flexible and robust implementations of multivariate adaptive regression splines within a wastewater treatment stochastic dynamic program. Quality and Reliability Engineering International, 21, 689–699.
Article Google Scholar
Tsai, J. C. C., Chen, V. C. P., Beck, M. B., & Chen, J. (2004). Stochastic dynamic programming formulation for a wastewater treatment decision-making framework. Annals of Operations Research, 132, 207–221. Special issue on applied optimization under uncertainty.
Article Google Scholar
Yang, Z., Chen, V. C. P., Chang, M. E., Murphy, T. E., & Tsai, J. C. C. (2007). Mining and modeling for a metropolitan Atlanta ozone pollution decision-making framework. IIE Transactions, 39, 607–615. Special issue on data mining.
Article Google Scholar
Yang, Z., Chen, V. C. P., Chang, M. E., Sattler, M. L., & Wen, A. (2009). A decision-making framework for ozone pollution control. Operations Research, 57(2), 484–498.
Article Google Scholar

Download references

Acknowledgements

We are grateful to the reviewers for their useful comments and suggestion, which greatly improved the quality of the paper. This research was partially supported by the Dallas-Fort Worth International Airport, National Science Foundation grant ECCS-0801802, and Brain Korea 21 (Network Enterprise).

Author information

Authors and Affiliations

Conifer Health Solutions, 2401 Internet Boulevard, Frisco, TX, 75034, USA
Dachuan T. Shih
School of Industrial Management Engineering, Korea University, Seoul, Republic of Korea
Seoung Bum Kim
Department of Industrial and Manufacturing Systems Engineering, The University of Texas at Arlington, Campus Box 19017, Arlington, TX, 76019-0017, USA
Victoria C. P. Chen & Jay M. Rosenberger
American Airlines, 4333 Amon Carter Blvd., MD 5358, Fort Worth, TX, 76155, USA
Venkata L. Pilla

Authors

Dachuan T. Shih
View author publications
You can also search for this author inPubMed Google Scholar
Seoung Bum Kim
View author publications
You can also search for this author inPubMed Google Scholar
Victoria C. P. Chen
View author publications
You can also search for this author inPubMed Google Scholar
Jay M. Rosenberger
View author publications
You can also search for this author inPubMed Google Scholar
Venkata L. Pilla
View author publications
You can also search for this author inPubMed Google Scholar

Corresponding author

Correspondence to Seoung Bum Kim.

Appendix: Airline fleet assignment model formulation

The optimization formulation from Pilla et al. (2008) is reproduced here for the readers’ reference.

Let L be the set of flight legs (indexed by l). Let F denote the set of fleet types (indexed by f), and G be the set of crew-compatible families (indexed by g), which can be used for each of the legs l∈ L. Since we assign crew-compatible families in the first stage, for each leg l∈L and for each crew-compatible family type g∈G, let a binary variable x _gl be defined such that

In the second stage, we assign specific aircraft within the crew-compatible family. As such, for each leg l∈L, for each aircraft type f∈F, and for each scenario ξ∈Ξ, let a binary variable $x^{\xi}_{fl}$ be defined such that

Since a combined FAM and PMM model is used, let the decision variable $z^{\xi}_{i}$ represent the number of booked passengers for itinerary-fare class i in scenario ξ.

For combined FAM and PMM, consider the following additional parameters:

S = set of stations, indexed by s,
I = set of itinerary-fare classes, indexed by i,
V = set of nodes in the entire network, indexed by v,
f(v) = fleet type associated with node v,
A _v = set of flights arriving at node v,
D _v = set of flights departing at node v,
M _f = number of aircraft of type f,
f _i = fare for itinerary-fare class i,
C _fl = cost if aircraft type f is assigned to flight leg l,
$a^{\xi}_{v^{+}}$ = value of ground arc leaving node v for scenario ξ,
$a^{\xi}_{v^{-}}$ = value of ground arc entering node v for scenario ξ,
O _f = set of arcs that include the plane count hour for fleet type f, indexed by o,
L ₀ = set of flight legs in air at the plane count hour,
Cap _f = capacity of aircraft type f,
$D^{\xi}_{i}$ = demand for itinerary-fare class i in scenario ξ.

The two-stage formulation can be represented as:

(4)

(5)

(6)

(7)

(8)

(9)

The objective is to maximize profit (revenue − cost) in the second stage by assigning aircraft within the crew-compatible allocation made in the first stage. The block time of a flight leg l is defined as the length of time from the moment the plane leaves the origin station until it arrives at the destination station. Let b _l be the scheduled block time for flight leg l. The cost for each flight leg is calculated as a function of block time and operating cost of a particular fleet type per block hour, and is given by:

$$C_{fl} = b_{l} * ({\mbox{Operating cost per block hour}})_{f}.$$

Constraints in set (4) represent the balance constraints needed to maintain the circulation of aircraft throughout the network. Cover constraints (5) guarantee that aircraft within the crew-compatible family (assigned in the first stage) are allocated. For formulating the plane count constraints (6), we need to count the number of aircraft of each fleet being used at a particular point of the day (generally when there are fewer planes in the air). As such the ground arcs that cross the time line at the plane count hour and the flights in air during that time are summed to assure that the total number of aircraft of a particular fleet type do not exceed the number available. Constraints (7) impose the seat capacity limits, i.e., the sum of all the booked passengers on different itineraries for a flight l should not exceed the capacity of the aircraft assigned and constraint (8) to meet the forecasted demand.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Shih, D.T., Kim, S.B., Chen, V.C.P. et al. Efficient computer experiment-based optimization through variable selection. Ann Oper Res 216, 287–305 (2014). https://doi.org/10.1007/s10479-012-1129-y

Download citation

Published: 18 April 2012
Issue Date: May 2014
DOI: https://doi.org/10.1007/s10479-012-1129-y

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Efficient computer experiment-based optimization through variable selection

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

A General Mathematical Framework for Constrained Mixed-variable Blackbox Optimization Problems with Meta and Categorical Variables

A Two-Phase Approach for Model-Based Design of Experiments Applied in Chemical Engineering

A Study of New Variable Selection Method Within a Framework of Real-Coded Genetic Algorithm

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Appendix: Airline fleet assignment model formulation

Appendix: Airline fleet assignment model formulation

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now