Skip to main content
Log in

Efficient computer experiment-based optimization through variable selection

  • Published:
Annals of Operations Research Aims and scope Submit manuscript

Abstract

A computer experiment-based optimization approach employs design of experiments and statistical modeling to represent a complex objective function that can only be evaluated pointwise by running a computer model. In large-scale applications, the number of variables is huge, and direct use of computer experiments would require an exceedingly large experimental design and, consequently, significant computational effort. If a large portion of the variables have little impact on the objective, then there is a need to eliminate these before performing the complete set of computer experiments. This is a variable selection task. The ideal variable selection method for this task should handle unknown nonlinear structure, should be computationally fast, and would be conducted after a small number of computer experiment runs, likely fewer runs (N) than the number of variables (P). Conventional variable selection techniques are based on assumed linear model forms and cannot be applied in this “large P and small N” problem. In this paper, we present a framework that adds a variable selection step prior to computer experiment-based optimization, and we consider data mining methods, using principal components analysis and multiple testing based on false discovery rate, that are appropriate for our variable selection task. An airline fleet assignment case study is used to illustrate our approach.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2

Similar content being viewed by others

References

  • Bellman, R. E. (1957). Dynamic programming. Princeton: Princeton University Press.

    Google Scholar 

  • Benjamini, Y., & Hochberg, Y. (1995). Controlling the false discovery rate: a practical and powerful approach to multiple testing. Journal of the Royal Statistical Society. Series B, 57, 289–300.

    Google Scholar 

  • Benjamini, Y., & Yekutieli, D. (2001). The control of the false discovery rate in multiple testing under dependency. Annals of Statistics, 29, 1165–1188.

    Article  Google Scholar 

  • Berge, M. E., & Hopperstad, C. A. (1993). Demand driven dispatch: a method of dynamic aircraft capacity assignment, models and algorithms. Operations Research, 41(1), 153–168.

    Article  Google Scholar 

  • Birge, J. R., & Louveaux, F. (1997). Introduction to stochastic programming. New York: Springer.

    Google Scholar 

  • Cervellera, C., Chen, V. C. P., & Wen, A. (2006). Optimization of a large-scale water reservoir network by stochastic dynamic programming with efficient state space discretization. European Journal of Operational Research, 171, 1139–1151.

    Article  Google Scholar 

  • Chen, V. C. P. (1999). Application of MARS and orthogonal arrays to inventory forecasting stochastic dynamic programs. Computational Statistics and Data Analysis, 30, 317–341.

    Article  Google Scholar 

  • Chen, V. C. P., Ruppert, D., & Shoemaker, C. A. (1999). Applying experimental design and regression splines to high-dimensional continuous-state stochastic dynamic programming. Operations Research, 47, 38–53.

    Article  Google Scholar 

  • Chen, V. C. P., Günther, D., & Johnson, E. L. (2003). Solving for an optimal airline yield management policy via statistical learning. Journal of the Royal Statistical Society. Series C, 52(1), 1–12.

    Article  Google Scholar 

  • Chen, V. C. P., Tsui, K.-L., Barton, R. R., & Meckesheimer, M. (2006). Design, modeling, and applications of computer experiments. IIE Transactions, 38, 273–291.

    Article  Google Scholar 

  • Efron, B. (2004). Large-scale simultaneous hypothesis testing: the choice of a null hypothesis. Journal of the American Statistical Association, 99, 99–104.

    Google Scholar 

  • Elomaa, T., & Rousu, J. (2002). Fast minimum training error discretization. In Proceedings of the ninetheenth international conference on machine learning, Sydney, Australia (p. 131–138).

    Google Scholar 

  • Fayyad, U. M., & Irani, K. B. (1992). On the handling of continuous-valued attributes in decision tree generation. Machine Learning, 8(1), 82–102.

    Google Scholar 

  • Friedman, J. H. (1991). Multivariate adaptive regression splines (with discussion). Annals of Statistics, 19, 1–141.

    Article  Google Scholar 

  • Gopalakrishnan, B., & Johnson, E. L. (2005). Airline crew scheduling: state-of-the-art. Annals of Operations Research, 140, 305–337.

    Article  Google Scholar 

  • Guyon, I., & Elisseeff, A. (2003). An introduction to variable and feature selection. Journal of Machine Learning Research, 3, 1157–1182.

    Google Scholar 

  • Jain, A. K., Duin, R., & Mao, J. (2000). Statistical pattern recognition: a review. IEEE Transactions on Pattern Analysis and Machine Intelligence, 20, 4–37.

    Article  Google Scholar 

  • Jolliffe, I. T. (2002). Principal components analysis. New York: Springer.

    Google Scholar 

  • Kim, S. B., Tsui, K. L., & Borodovsky, M. (2006). Multiple hypothesis testing in large-scale contingency tables: inferring patterns of pair-wise amino acid association in β-sheets. International Journal of Bioinformatics Research and Applications, 2, 193–217.

    Google Scholar 

  • Kim, S. B., Wang, Z., Oraintara, S., Temiyasathit, C., & Wongsawat, Y. (2008). Feature selection and classification of high-resolution NMR spectra in the complex wavelet transform domain. Chemometrics and Intelligent Laboratory Systems, 90(2), 161–168.

    Article  Google Scholar 

  • Kleijnen, J. P. C. (2005). An overview of the design and analysis of simulation experiments for sensitivity analysis. European Journal of Operational Research, 164(2), 287–300.

    Article  Google Scholar 

  • McGill, J., & van Ryzin, G. J. (1999). Revenue management: research overview and prospects. Transportation Science, 33, 233–256.

    Article  Google Scholar 

  • Mitchell, T. M. (1997). Machine learning. New York: McGraw-Hill.

    Google Scholar 

  • Pilla, V. L. (2006). Robust airline fleet assignment. PhD thesis, University of Texas at Arlington.

  • Pilla, V. L., Rosenberger, J. M., Chen, V. C. P., & Smith, B. (2008). A statistical computer experiments approach to airline fleet assignment. IIE Transactions, 40, 524–537.

    Article  Google Scholar 

  • Pilla, V. L., Rosenberger, J. M., Chen, V. C. P., Engsuwan, N., & Siddappa, S. (2012). A multivariate adaptive regression splines cutting plane approach for solving a two-stage stochastic programming fleet assignment model. European Journal of Operational Research, 216, 162–171.

    Article  Google Scholar 

  • Powell, W. B. (2007). Approximate dynamic programming: solving the curses of dimensionality. Hoboken: Wiley.

    Book  Google Scholar 

  • Sacks, J., Welch, W. J., Mitchell, T. J., & Wynn, H. P. (1989). Design and analysis of computer experiments (with discussion). Statistical Science, 4, 409–423.

    Article  Google Scholar 

  • Sherali, H. D., Bish, E. K., & Zhu, X. (2006). Airline fleet assignment concepts, models, and algorithms. European Journal of Operational Research, 172, 1–30.

    Article  Google Scholar 

  • Sherali, H. D., & Zhu, X. (2008). Two-stage fleet assignment model considering stochastic passenger demands. Operations Research, 56(2), 383–399.

    Article  Google Scholar 

  • Shih, D. T., Chen, V. C. P., & Kim, S. B. (2006). Convex version of multivariate adaptive regression splines. In Proceedings of the 2006 industrial engineering research conference, Orlando, FL, USA.

  • Storey, J. D., & Tibshirani, R. (2003). Statistical significance for genomewide studies. Proceedings of the National Academy of Sciences of the United States of America, 100, 9440–9445.

    Article  Google Scholar 

  • Temiyasathit, C., Kim, S. B., & Park, S. K. (2009). Spatial prediction of ozone concentration profiles. Computational Statistics & Data Analysis, 53, 3892–3906.

    Article  Google Scholar 

  • Tsai, J. C. C., & Chen, V. C. P. (2005). Flexible and robust implementations of multivariate adaptive regression splines within a wastewater treatment stochastic dynamic program. Quality and Reliability Engineering International, 21, 689–699.

    Article  Google Scholar 

  • Tsai, J. C. C., Chen, V. C. P., Beck, M. B., & Chen, J. (2004). Stochastic dynamic programming formulation for a wastewater treatment decision-making framework. Annals of Operations Research, 132, 207–221. Special issue on applied optimization under uncertainty.

    Article  Google Scholar 

  • Yang, Z., Chen, V. C. P., Chang, M. E., Murphy, T. E., & Tsai, J. C. C. (2007). Mining and modeling for a metropolitan Atlanta ozone pollution decision-making framework. IIE Transactions, 39, 607–615. Special issue on data mining.

    Article  Google Scholar 

  • Yang, Z., Chen, V. C. P., Chang, M. E., Sattler, M. L., & Wen, A. (2009). A decision-making framework for ozone pollution control. Operations Research, 57(2), 484–498.

    Article  Google Scholar 

Download references

Acknowledgements

We are grateful to the reviewers for their useful comments and suggestion, which greatly improved the quality of the paper. This research was partially supported by the Dallas-Fort Worth International Airport, National Science Foundation grant ECCS-0801802, and Brain Korea 21 (Network Enterprise).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Seoung Bum Kim.

Appendix: Airline fleet assignment model formulation

Appendix: Airline fleet assignment model formulation

The optimization formulation from Pilla et al. (2008) is reproduced here for the readers’ reference.

Let L be the set of flight legs (indexed by l). Let F denote the set of fleet types (indexed by f), and G be the set of crew-compatible families (indexed by g), which can be used for each of the legs lL. Since we assign crew-compatible families in the first stage, for each leg lL and for each crew-compatible family type gG, let a binary variable x gl be defined such that

In the second stage, we assign specific aircraft within the crew-compatible family. As such, for each leg lL, for each aircraft type fF, and for each scenario ξ∈Ξ, let a binary variable \(x^{\xi}_{fl}\) be defined such that

Since a combined FAM and PMM model is used, let the decision variable \(z^{\xi}_{i}\) represent the number of booked passengers for itinerary-fare class i in scenario ξ.

For combined FAM and PMM, consider the following additional parameters:

  • S = set of stations, indexed by s,

  • I = set of itinerary-fare classes, indexed by i,

  • V = set of nodes in the entire network, indexed by v,

  • f(v) = fleet type associated with node v,

  • A v = set of flights arriving at node v,

  • D v = set of flights departing at node v,

  • M f = number of aircraft of type f,

  • f i = fare for itinerary-fare class i,

  • C fl = cost if aircraft type f is assigned to flight leg l,

  • \(a^{\xi}_{v^{+}}\) = value of ground arc leaving node v for scenario ξ,

  • \(a^{\xi}_{v^{-}}\) = value of ground arc entering node v for scenario ξ,

  • O f = set of arcs that include the plane count hour for fleet type f, indexed by o,

  • L 0 = set of flight legs in air at the plane count hour,

  • Cap f = capacity of aircraft type f,

  • \(D^{\xi}_{i}\) = demand for itinerary-fare class i in scenario ξ.

The two-stage formulation can be represented as:

(4)
(5)
(6)
(7)
(8)
(9)

The objective is to maximize profit (revenue − cost) in the second stage by assigning aircraft within the crew-compatible allocation made in the first stage. The block time of a flight leg l is defined as the length of time from the moment the plane leaves the origin station until it arrives at the destination station. Let b l be the scheduled block time for flight leg l. The cost for each flight leg is calculated as a function of block time and operating cost of a particular fleet type per block hour, and is given by:

$$C_{fl} = b_{l} * ({\mbox{Operating cost per block hour}})_{f}.$$

Constraints in set (4) represent the balance constraints needed to maintain the circulation of aircraft throughout the network. Cover constraints (5) guarantee that aircraft within the crew-compatible family (assigned in the first stage) are allocated. For formulating the plane count constraints (6), we need to count the number of aircraft of each fleet being used at a particular point of the day (generally when there are fewer planes in the air). As such the ground arcs that cross the time line at the plane count hour and the flights in air during that time are summed to assure that the total number of aircraft of a particular fleet type do not exceed the number available. Constraints (7) impose the seat capacity limits, i.e., the sum of all the booked passengers on different itineraries for a flight l should not exceed the capacity of the aircraft assigned and constraint (8) to meet the forecasted demand.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Shih, D.T., Kim, S.B., Chen, V.C.P. et al. Efficient computer experiment-based optimization through variable selection. Ann Oper Res 216, 287–305 (2014). https://doi.org/10.1007/s10479-012-1129-y

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10479-012-1129-y

Keywords

Navigation