Abstract
To induce a desired correlation structure among random variables, widely popular simulation software relies upon the method of Iman and Conover (IC). The underlying premise is that the induced Spearman rank correlation is a meaningful way to approximate other correlation measures among the random variables (e.g., Pearson’s correlation). However, as expected, the desired a posteriori correlation structure often deviates from the Spearman correlation structure. Rooted in the same principle of IC, we propose an alternative distribution-free method based on mixed-integer programming to induce a Pearson correlation structure to bivariate or multivariate random vectors. We also extend our distribution-free method to other correlation measures such as Kendall’s coefficient of concordance, Phi correlation coefficient, and relative risk. We illustrate our method in four different contexts: (1) the simulation of a healthcare facility, (2) the analysis of a manufacturing tandem queue, (3) the imputation of correlated missing data in statistical analysis, and (4) the estimation of the budget overrun risk in a construction project. We also explore the limits of our algorithms by conducting extensive experiments using randomly generated data from multiple distributions.




Similar content being viewed by others
References
Abdella M, Marwala T (2005) The use of genetic algorithms and neural networks to approximate missing data in database. In: IEEE 3rd international conference on computational cybernetics, 2005 (ICCC 2005). IEEE, pp 207–212
Altiok T, Melamed B (2001) The case for modeling correlation in manufacturing systems. IIE Trans 33(9):779–791
Batista G, Monard MC (2003) An analysis of four missing data treatment methods for supervised learning. Appl Artif Intell 17(5–6):519–533
Biswas A (2004) Generating correlated ordinal categorical random samples. Stat Probab Lett 70(1):25–35
Cahen EJ, Mandjes M, Zwart B (2018) Estimating large delay probabilities in two correlated queues. ACM Trans Model Comput Simul 28(1):2
Cario MC, Nelson BL (1997) Modeling and generating random vectors with arbitrary marginal distributions and correlation matrix. Technical report, Department of Industrial Engineering and Management Science, Northwestern University
Chakraborty A (2006) Generating multivariate correlated samples. Comput Stat 21(1):103–119
Charmpis DC, Panteli PL (2004) A heuristic approach for the generation of multivariate random samples with specified marginal distributions and correlation matrix. Comput Stat 19(2):283
Clark DE, El-Taha M (1998) Generation of correlated logistic-normal random variates for medical decision trees. Methods Inf Med 37(03):235–238
Conway DA (1979) Multivariate distributions with specified marginals. Technical report no. 145, Stanford University. https://statistics.stanford.edu/sites/default/files/OLK%20NSF%20145.pdf
Cornfield J (1951) A method of estimating comparative rates from clinical data: applications to cancer of the lung, breast, and cervix. J Natl Cancer Inst 11(6):1269–1275
Corredor D, Cabrera N, Medaglia AL, Akhavan-Tabatabaei R (2020) Data-driven approach for the shortest \(\alpha\)-reliable path problem. COPA working paper
Dai YS, Xie M, Poh KL, Ng SH (2004) A model for correlated failures in n-version programming. IIE Trans 36(12):1183–1192
Deb R, Liew AW-C (2016) Missing value imputation for the analysis of incomplete traffic accident data. Inf Sci 339:274–289
Desaulniers G, Desrosiers J, Solomon MM (2006) Column generation, vol 5. Springer, New York
Dias CTDS, Samaranayaka A, Manly B (2008) On the use of correlated beta random variables with animal population modeling. Ecol Model 215:293–300
Ghosh S, Henderson SG (2003) Behavior of the Norta method for correlated random vector generation as the dimension increases. ACM Trans Model Comput Simul 13(3):276–294
Gross D, Harris CM (1985) Fundamentals of queueing theory. Wiley, New York
Haas CN (1999) On modeling correlated random variables in risk assessment. Risk Anal 19(6):1205–1214
Harris CM, Hoffman KL, Yarrow L- (1995a) Obtaining minimum-correlation Latin hypercube sampling plans using an ip-based heuristic. OR Spektrum 17(2–3):139–148
Harris CM, Hoffman KL, Yarrow L-A (1995b) Using integer programming techniques for the solution of an experimental design problem. Ann Oper Res 58(3):243–260
Hill RR, Reilly CH (1994) Composition for multivariate random variables. In: Proceedings of winter simulation conference. IEEE, pp 332–339
Hill RR, Reilly CH (2000) The effects of coefficient correlation structure in two-dimensional knapsack problems on solution procedure performance. Manag Sci 46(2):302–317
Iman RL, Conover W-J (1982) A distribution-free approach to inducing rank correlation among input variables. Commun Stat Simul Comput 11(3):311–334
Kendall MG, Babington-Smith B (1939) The problem of m rankings. Ann Math Stat 10(3):275–287
Kolev N, Paiva D (2008) Random sums of exchangeable variables and actuarial applications. Insur Math Econ 42(1):147–153
Law AM, Kelton WD (2000) Simulation modeling and analysis, 3rd edn. Mc Graw-Hill, New York
L’Ecuyer P, Meliani L, Vaucher J (2002) Ssj: a framework for stochastic simulation in java. In: Proceedings of the (2002) winter simulation conference. IEEE, Piscataway, NJ, pp 234–242
Legendre P (2005) Species associations: the Kendall coefficient of concordance revisited. J Agric Biol Environ Stat 10(2):226–245
Leschied JR, Mazza MB, Davenport MS, Chong ST, Smith EA, Hoff CN, Ladino-Torres MF, Khalatbari S, Ehrlich PF, Dillman JR (2016) Inter-radiologist agreement for CT scoring of pediatric splenic injuries and effect on an established clinical practice guideline. Pediatr Radiol 46(2):229–236
Levitin G, Xie M (2006) Performance distribution of a fault-tolerant system in the presence of failure correlation. IIE Trans 38(6):499–509
Li ST, Hammond JL (1975) Generation of pseudorandom numbers with specified univariate distributions and correlation coefficients. IEEE Trans Syst Man Cybern 5:557–561
Little RJA, Rubin DB (2019) Statistical analysis with missing data, vol 793. Wiley, New York
Lübbecke ME, Desrosiers J (2005) Selected topics in column generation. Oper Res 53(6):1007–1023
Lurie PM, Goldberg MS (1998) An approximate method for sampling correlated random variables from partially-specified distributions. Manag Sci 44(2):203–218
Medaglia AL, Sefair JA (2009) Generating correlated random vectors using mixed-integer programming. In: Proceedings of the IIE annual conference. Institute of Industrial and Systems Engineers (IISE), 1759
Mildenhall SJ (2005) Correlation and aggregate loss distributions with an emphasis on the Iman–Conover method. http://www.casact.org/pubs/forum/06wforum/06w105.pdf. Part one of “The Report of the Research Working Party on Correlations and Dependencies Among All Risk Sources.” Casualty Actuarial Society Forum (Winter 2005)
Mitchell CR, Paulson AS, Beswick CA (1977) Effect of correlated exponential service times on single server tandem queues. Naval Res Logist 24(1):95–112
Moorthy K, Saberi Mohamad M, Deris S (2014) A review on missing value imputation algorithms for microarray gene expression data. Curr Bioinform 9(1):18–22
Morris JA, Gardner MJ (1988) Calculating confidence intervals for relative risk (odds ratios) and standardised ratios and rates. Br Med J 296(6632):1313–1316
Nasr WW, Maddah B (2015) Continuous (s, S) policy with MMPP correlated demand. Eur J Oper Res 246(3):874–885
Oracle (2019) Oracle(@) Crystal Ball reference and examples guide. http://www.hstoday.us/. Release 11.1.2.4, Accessed July 2019
Park CG, Dong WS (1998) An algorithm for generating correlated random variables in a class of infinitely divisible distributions. J Stat Comput Simul 61(1–2):127–139
Park CG, Park T, Shin DW (1996) A simple method for generating correlated binary variates. Am Stat 50(4):306–310
Patuwo BE, Disney RL, McNickle DC (1993) The effect of correlated arrivals on queues. IIE Trans 25(3):105–110
Polge RJ, Holliday EM, Bhagavan BK (1973) Generation of a pseudo-random set with desired correlation and probability distribution. Simulation 20(5):153–158
Pouillot R, Delignette-Muller M-L (2010) Evaluating variability and uncertainty in microbial quantitative risk assessment using two R packages. Int J Food Microbiol 142(3):330–40
Qaqish BF (2003) A family of multivariate binary distributions for simulating correlated binary variables with specified marginal means and correlations. Biometrika 90(2):455–463
Reilly CH (2009) Synthetic optimization problem generation: show us the correlations! INFORMS J Comput 21(3):458–467
Rosenfeld S (2008) Approximate bivariate gamma generator with prespecified correlation and different marginal shapes. ACM Trans Model Comput Simul 18(4):16
Schmeiser BW, Lal R (1982) Bivariate gamma random vectors. Oper Res 30(2):355–374
Sefair JA, Méndez CY, Babat O, Medaglia AL, Zuluaga LF (2017) Linear solution schemes for mean-semivariance project portfolio selection problems: an application in the oil and gas industry. Omega 68:39–48
Sheskin DJ (2000) Handbook of parametric and nonparametric statistical procedures, 3rd edn. Chapman and Hall-CRC, Boca Raton
Shin K, Pasupathy R (2010) An algorithm for fast generation of bivariate Poisson random vectors. INFORMS J Comput 22(1):81–92
Shults J (2017) Simulating longer vectors of correlated binary random variables via multinomial sampling. Comput Stat Data Anal 114:1–11
Sigler EA, Tallent-Runnels MK (2006) Examining the validity of scores from an instrument designed to measure metacognition of problem solving. J Gener Psychol 133(2):257–276
Stanfield PM, Wilson JR, King RE (2004) Flexible modelling of correlated operation times with application in product-reuse facilities. Int J Prod Res 42(11):2179–2196
Todd CR, Ng MP (2001) Generating unbiased correlated random survival rates for stochastic population models. Ecol Model 144(1):1–11
Touran A (1993) Probabilistic cost estimating with subjective correlations. J Constr Eng Manag 119(1):58–71
Touran A, Suphot L (1997) Rank correlations in simulating construction cost. J Constr Eng Manag 123(3):297–301
Toyoda Yoshiaki (1975) A simplified algorithm for obtaining approximate solutions to zero-one programming problems. Manag Sci 21(12):1417–1427. https://doi.org/10.1287/mnsc.21.12.1417
Van der Geest PAG (1998) An algorithm to generate samples of multi-variate distributions with correlated marginals. Comput Stat Data Anal 27(3):271–289
Wallis WA (1939) The correlation ratio for ranked data. J Am Stat Assoc 34(207):533–538
Xiao Q (2017) Generating correlated random vector involving discrete variables. Commun Stat Theory Methods 46(4):1594–1605
Yan C, Kung J (2016) Robust aircraft routing. Transp Sci 52(1):118–133
Young DJ, Beaulieu NC (2000) The generation of correlated Rayleigh random variates by inverse discrete Fourier transform. IEEE Trans Commun 48(7):1114–1127
Zhang Yufeng, Khani Alireza (2019) An algorithm for reliable shortest path problem with travel time correlations. Transp Res Part B Methodol 121:92–113. https://doi.org/10.1016/j.trb.2018.12.011
Acknowledgements
The authors would like to thank Professor Jim Wilson from NC State University for sharing his encouraging and valuable input at an earlier stage of this work. The authors sincerely thank Professor Douglas Montgomery at Arizona State University for his valuable comments to improve the manuscript. Also, authors thank Gurobi and FICO for providing access to their commercial optimization solvers under their academic licensing programs. The authors would like to thank the two anonymous reviewers, whose comments greatly improved the article. This material is based upon work supported by Dr. Sefair’s National Science Foundation Grant No. 1740042.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary Information
Below is the link to the electronic supplementary material.
Rights and permissions
About this article
Cite this article
Sefair, J.A., Guaje, O. & Medaglia, A.L. A column-oriented optimization approach for the generation of correlated random vectors. OR Spectrum 43, 777–808 (2021). https://doi.org/10.1007/s00291-021-00620-5
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00291-021-00620-5