Abstract
A computer experiment-based optimization approach employs design of experiments and statistical modeling to represent a complex objective function that can only be evaluated pointwise by running a computer model. In large-scale applications, the number of variables is huge, and direct use of computer experiments would require an exceedingly large experimental design and, consequently, significant computational effort. If a large portion of the variables have little impact on the objective, then there is a need to eliminate these before performing the complete set of computer experiments. This is a variable selection task. The ideal variable selection method for this task should handle unknown nonlinear structure, should be computationally fast, and would be conducted after a small number of computer experiment runs, likely fewer runs (N) than the number of variables (P). Conventional variable selection techniques are based on assumed linear model forms and cannot be applied in this “large P and small N” problem. In this paper, we present a framework that adds a variable selection step prior to computer experiment-based optimization, and we consider data mining methods, using principal components analysis and multiple testing based on false discovery rate, that are appropriate for our variable selection task. An airline fleet assignment case study is used to illustrate our approach.


Similar content being viewed by others
References
Bellman, R. E. (1957). Dynamic programming. Princeton: Princeton University Press.
Benjamini, Y., & Hochberg, Y. (1995). Controlling the false discovery rate: a practical and powerful approach to multiple testing. Journal of the Royal Statistical Society. Series B, 57, 289–300.
Benjamini, Y., & Yekutieli, D. (2001). The control of the false discovery rate in multiple testing under dependency. Annals of Statistics, 29, 1165–1188.
Berge, M. E., & Hopperstad, C. A. (1993). Demand driven dispatch: a method of dynamic aircraft capacity assignment, models and algorithms. Operations Research, 41(1), 153–168.
Birge, J. R., & Louveaux, F. (1997). Introduction to stochastic programming. New York: Springer.
Cervellera, C., Chen, V. C. P., & Wen, A. (2006). Optimization of a large-scale water reservoir network by stochastic dynamic programming with efficient state space discretization. European Journal of Operational Research, 171, 1139–1151.
Chen, V. C. P. (1999). Application of MARS and orthogonal arrays to inventory forecasting stochastic dynamic programs. Computational Statistics and Data Analysis, 30, 317–341.
Chen, V. C. P., Ruppert, D., & Shoemaker, C. A. (1999). Applying experimental design and regression splines to high-dimensional continuous-state stochastic dynamic programming. Operations Research, 47, 38–53.
Chen, V. C. P., Günther, D., & Johnson, E. L. (2003). Solving for an optimal airline yield management policy via statistical learning. Journal of the Royal Statistical Society. Series C, 52(1), 1–12.
Chen, V. C. P., Tsui, K.-L., Barton, R. R., & Meckesheimer, M. (2006). Design, modeling, and applications of computer experiments. IIE Transactions, 38, 273–291.
Efron, B. (2004). Large-scale simultaneous hypothesis testing: the choice of a null hypothesis. Journal of the American Statistical Association, 99, 99–104.
Elomaa, T., & Rousu, J. (2002). Fast minimum training error discretization. In Proceedings of the ninetheenth international conference on machine learning, Sydney, Australia (p. 131–138).
Fayyad, U. M., & Irani, K. B. (1992). On the handling of continuous-valued attributes in decision tree generation. Machine Learning, 8(1), 82–102.
Friedman, J. H. (1991). Multivariate adaptive regression splines (with discussion). Annals of Statistics, 19, 1–141.
Gopalakrishnan, B., & Johnson, E. L. (2005). Airline crew scheduling: state-of-the-art. Annals of Operations Research, 140, 305–337.
Guyon, I., & Elisseeff, A. (2003). An introduction to variable and feature selection. Journal of Machine Learning Research, 3, 1157–1182.
Jain, A. K., Duin, R., & Mao, J. (2000). Statistical pattern recognition: a review. IEEE Transactions on Pattern Analysis and Machine Intelligence, 20, 4–37.
Jolliffe, I. T. (2002). Principal components analysis. New York: Springer.
Kim, S. B., Tsui, K. L., & Borodovsky, M. (2006). Multiple hypothesis testing in large-scale contingency tables: inferring patterns of pair-wise amino acid association in β-sheets. International Journal of Bioinformatics Research and Applications, 2, 193–217.
Kim, S. B., Wang, Z., Oraintara, S., Temiyasathit, C., & Wongsawat, Y. (2008). Feature selection and classification of high-resolution NMR spectra in the complex wavelet transform domain. Chemometrics and Intelligent Laboratory Systems, 90(2), 161–168.
Kleijnen, J. P. C. (2005). An overview of the design and analysis of simulation experiments for sensitivity analysis. European Journal of Operational Research, 164(2), 287–300.
McGill, J., & van Ryzin, G. J. (1999). Revenue management: research overview and prospects. Transportation Science, 33, 233–256.
Mitchell, T. M. (1997). Machine learning. New York: McGraw-Hill.
Pilla, V. L. (2006). Robust airline fleet assignment. PhD thesis, University of Texas at Arlington.
Pilla, V. L., Rosenberger, J. M., Chen, V. C. P., & Smith, B. (2008). A statistical computer experiments approach to airline fleet assignment. IIE Transactions, 40, 524–537.
Pilla, V. L., Rosenberger, J. M., Chen, V. C. P., Engsuwan, N., & Siddappa, S. (2012). A multivariate adaptive regression splines cutting plane approach for solving a two-stage stochastic programming fleet assignment model. European Journal of Operational Research, 216, 162–171.
Powell, W. B. (2007). Approximate dynamic programming: solving the curses of dimensionality. Hoboken: Wiley.
Sacks, J., Welch, W. J., Mitchell, T. J., & Wynn, H. P. (1989). Design and analysis of computer experiments (with discussion). Statistical Science, 4, 409–423.
Sherali, H. D., Bish, E. K., & Zhu, X. (2006). Airline fleet assignment concepts, models, and algorithms. European Journal of Operational Research, 172, 1–30.
Sherali, H. D., & Zhu, X. (2008). Two-stage fleet assignment model considering stochastic passenger demands. Operations Research, 56(2), 383–399.
Shih, D. T., Chen, V. C. P., & Kim, S. B. (2006). Convex version of multivariate adaptive regression splines. In Proceedings of the 2006 industrial engineering research conference, Orlando, FL, USA.
Storey, J. D., & Tibshirani, R. (2003). Statistical significance for genomewide studies. Proceedings of the National Academy of Sciences of the United States of America, 100, 9440–9445.
Temiyasathit, C., Kim, S. B., & Park, S. K. (2009). Spatial prediction of ozone concentration profiles. Computational Statistics & Data Analysis, 53, 3892–3906.
Tsai, J. C. C., & Chen, V. C. P. (2005). Flexible and robust implementations of multivariate adaptive regression splines within a wastewater treatment stochastic dynamic program. Quality and Reliability Engineering International, 21, 689–699.
Tsai, J. C. C., Chen, V. C. P., Beck, M. B., & Chen, J. (2004). Stochastic dynamic programming formulation for a wastewater treatment decision-making framework. Annals of Operations Research, 132, 207–221. Special issue on applied optimization under uncertainty.
Yang, Z., Chen, V. C. P., Chang, M. E., Murphy, T. E., & Tsai, J. C. C. (2007). Mining and modeling for a metropolitan Atlanta ozone pollution decision-making framework. IIE Transactions, 39, 607–615. Special issue on data mining.
Yang, Z., Chen, V. C. P., Chang, M. E., Sattler, M. L., & Wen, A. (2009). A decision-making framework for ozone pollution control. Operations Research, 57(2), 484–498.
Acknowledgements
We are grateful to the reviewers for their useful comments and suggestion, which greatly improved the quality of the paper. This research was partially supported by the Dallas-Fort Worth International Airport, National Science Foundation grant ECCS-0801802, and Brain Korea 21 (Network Enterprise).
Author information
Authors and Affiliations
Corresponding author
Appendix: Airline fleet assignment model formulation
Appendix: Airline fleet assignment model formulation
The optimization formulation from Pilla et al. (2008) is reproduced here for the readers’ reference.
Let L be the set of flight legs (indexed by l). Let F denote the set of fleet types (indexed by f), and G be the set of crew-compatible families (indexed by g), which can be used for each of the legs l∈ L. Since we assign crew-compatible families in the first stage, for each leg l∈L and for each crew-compatible family type g∈G, let a binary variable x gl be defined such that

In the second stage, we assign specific aircraft within the crew-compatible family. As such, for each leg l∈L, for each aircraft type f∈F, and for each scenario ξ∈Ξ, let a binary variable \(x^{\xi}_{fl}\) be defined such that

Since a combined FAM and PMM model is used, let the decision variable \(z^{\xi}_{i}\) represent the number of booked passengers for itinerary-fare class i in scenario ξ.
For combined FAM and PMM, consider the following additional parameters:
-
S = set of stations, indexed by s,
-
I = set of itinerary-fare classes, indexed by i,
-
V = set of nodes in the entire network, indexed by v,
-
f(v) = fleet type associated with node v,
-
A v = set of flights arriving at node v,
-
D v = set of flights departing at node v,
-
M f = number of aircraft of type f,
-
f i = fare for itinerary-fare class i,
-
C fl = cost if aircraft type f is assigned to flight leg l,
-
\(a^{\xi}_{v^{+}}\) = value of ground arc leaving node v for scenario ξ,
-
\(a^{\xi}_{v^{-}}\) = value of ground arc entering node v for scenario ξ,
-
O f = set of arcs that include the plane count hour for fleet type f, indexed by o,
-
L 0 = set of flight legs in air at the plane count hour,
-
Cap f = capacity of aircraft type f,
-
\(D^{\xi}_{i}\) = demand for itinerary-fare class i in scenario ξ.
The two-stage formulation can be represented as:






The objective is to maximize profit (revenue − cost) in the second stage by assigning aircraft within the crew-compatible allocation made in the first stage. The block time of a flight leg l is defined as the length of time from the moment the plane leaves the origin station until it arrives at the destination station. Let b l be the scheduled block time for flight leg l. The cost for each flight leg is calculated as a function of block time and operating cost of a particular fleet type per block hour, and is given by:
Constraints in set (4) represent the balance constraints needed to maintain the circulation of aircraft throughout the network. Cover constraints (5) guarantee that aircraft within the crew-compatible family (assigned in the first stage) are allocated. For formulating the plane count constraints (6), we need to count the number of aircraft of each fleet being used at a particular point of the day (generally when there are fewer planes in the air). As such the ground arcs that cross the time line at the plane count hour and the flights in air during that time are summed to assure that the total number of aircraft of a particular fleet type do not exceed the number available. Constraints (7) impose the seat capacity limits, i.e., the sum of all the booked passengers on different itineraries for a flight l should not exceed the capacity of the aircraft assigned and constraint (8) to meet the forecasted demand.
Rights and permissions
About this article
Cite this article
Shih, D.T., Kim, S.B., Chen, V.C.P. et al. Efficient computer experiment-based optimization through variable selection. Ann Oper Res 216, 287–305 (2014). https://doi.org/10.1007/s10479-012-1129-y
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10479-012-1129-y