Skip to main content
Log in

A column-oriented optimization approach for the generation of correlated random vectors

  • Regular Article
  • Published:
OR Spectrum Aims and scope Submit manuscript

Abstract

To induce a desired correlation structure among random variables, widely popular simulation software relies upon the method of Iman and Conover (IC). The underlying premise is that the induced Spearman rank correlation is a meaningful way to approximate other correlation measures among the random variables (e.g., Pearson’s correlation). However, as expected, the desired a posteriori correlation structure often deviates from the Spearman correlation structure. Rooted in the same principle of IC, we propose an alternative distribution-free method based on mixed-integer programming to induce a Pearson correlation structure to bivariate or multivariate random vectors. We also extend our distribution-free method to other correlation measures such as Kendall’s coefficient of concordance, Phi correlation coefficient, and relative risk. We illustrate our method in four different contexts: (1) the simulation of a healthcare facility, (2) the analysis of a manufacturing tandem queue, (3) the imputation of correlated missing data in statistical analysis, and (4) the estimation of the budget overrun risk in a construction project. We also explore the limits of our algorithms by conducting extensive experiments using randomly generated data from multiple distributions.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4

Similar content being viewed by others

References

  • Abdella M, Marwala T (2005) The use of genetic algorithms and neural networks to approximate missing data in database. In: IEEE 3rd international conference on computational cybernetics, 2005 (ICCC 2005). IEEE, pp 207–212

  • Altiok T, Melamed B (2001) The case for modeling correlation in manufacturing systems. IIE Trans 33(9):779–791

    Article  Google Scholar 

  • Batista G, Monard MC (2003) An analysis of four missing data treatment methods for supervised learning. Appl Artif Intell 17(5–6):519–533

    Article  Google Scholar 

  • Biswas A (2004) Generating correlated ordinal categorical random samples. Stat Probab Lett 70(1):25–35

    Article  Google Scholar 

  • Cahen EJ, Mandjes M, Zwart B (2018) Estimating large delay probabilities in two correlated queues. ACM Trans Model Comput Simul 28(1):2

    Article  Google Scholar 

  • Cario MC, Nelson BL (1997) Modeling and generating random vectors with arbitrary marginal distributions and correlation matrix. Technical report, Department of Industrial Engineering and Management Science, Northwestern University

  • Chakraborty A (2006) Generating multivariate correlated samples. Comput Stat 21(1):103–119

    Article  Google Scholar 

  • Charmpis DC, Panteli PL (2004) A heuristic approach for the generation of multivariate random samples with specified marginal distributions and correlation matrix. Comput Stat 19(2):283

    Article  Google Scholar 

  • Clark DE, El-Taha M (1998) Generation of correlated logistic-normal random variates for medical decision trees. Methods Inf Med 37(03):235–238

    Article  Google Scholar 

  • Conway DA (1979) Multivariate distributions with specified marginals. Technical report no. 145, Stanford University. https://statistics.stanford.edu/sites/default/files/OLK%20NSF%20145.pdf

  • Cornfield J (1951) A method of estimating comparative rates from clinical data: applications to cancer of the lung, breast, and cervix. J Natl Cancer Inst 11(6):1269–1275

    Google Scholar 

  • Corredor D, Cabrera N, Medaglia AL, Akhavan-Tabatabaei R (2020) Data-driven approach for the shortest \(\alpha\)-reliable path problem. COPA working paper

  • Dai YS, Xie M, Poh KL, Ng SH (2004) A model for correlated failures in n-version programming. IIE Trans 36(12):1183–1192

    Article  Google Scholar 

  • Deb R, Liew AW-C (2016) Missing value imputation for the analysis of incomplete traffic accident data. Inf Sci 339:274–289

    Article  Google Scholar 

  • Desaulniers G, Desrosiers J, Solomon MM (2006) Column generation, vol 5. Springer, New York

    Google Scholar 

  • Dias CTDS, Samaranayaka A, Manly B (2008) On the use of correlated beta random variables with animal population modeling. Ecol Model 215:293–300

    Article  Google Scholar 

  • Ghosh S, Henderson SG (2003) Behavior of the Norta method for correlated random vector generation as the dimension increases. ACM Trans Model Comput Simul 13(3):276–294

    Article  Google Scholar 

  • Gross D, Harris CM (1985) Fundamentals of queueing theory. Wiley, New York

    Google Scholar 

  • Haas CN (1999) On modeling correlated random variables in risk assessment. Risk Anal 19(6):1205–1214

    Article  Google Scholar 

  • Harris CM, Hoffman KL, Yarrow L- (1995a) Obtaining minimum-correlation Latin hypercube sampling plans using an ip-based heuristic. OR Spektrum 17(2–3):139–148

    Article  Google Scholar 

  • Harris CM, Hoffman KL, Yarrow L-A (1995b) Using integer programming techniques for the solution of an experimental design problem. Ann Oper Res 58(3):243–260

    Article  Google Scholar 

  • Hill RR, Reilly CH (1994) Composition for multivariate random variables. In: Proceedings of winter simulation conference. IEEE, pp 332–339

  • Hill RR, Reilly CH (2000) The effects of coefficient correlation structure in two-dimensional knapsack problems on solution procedure performance. Manag Sci 46(2):302–317

    Article  Google Scholar 

  • Iman RL, Conover W-J (1982) A distribution-free approach to inducing rank correlation among input variables. Commun Stat Simul Comput 11(3):311–334

    Article  Google Scholar 

  • Kendall MG, Babington-Smith B (1939) The problem of m rankings. Ann Math Stat 10(3):275–287

    Article  Google Scholar 

  • Kolev N, Paiva D (2008) Random sums of exchangeable variables and actuarial applications. Insur Math Econ 42(1):147–153

    Article  Google Scholar 

  • Law AM, Kelton WD (2000) Simulation modeling and analysis, 3rd edn. Mc Graw-Hill, New York

    Google Scholar 

  • L’Ecuyer P, Meliani L, Vaucher J (2002) Ssj: a framework for stochastic simulation in java. In: Proceedings of the (2002) winter simulation conference. IEEE, Piscataway, NJ, pp 234–242

  • Legendre P (2005) Species associations: the Kendall coefficient of concordance revisited. J Agric Biol Environ Stat 10(2):226–245

    Article  Google Scholar 

  • Leschied JR, Mazza MB, Davenport MS, Chong ST, Smith EA, Hoff CN, Ladino-Torres MF, Khalatbari S, Ehrlich PF, Dillman JR (2016) Inter-radiologist agreement for CT scoring of pediatric splenic injuries and effect on an established clinical practice guideline. Pediatr Radiol 46(2):229–236

    Article  Google Scholar 

  • Levitin G, Xie M (2006) Performance distribution of a fault-tolerant system in the presence of failure correlation. IIE Trans 38(6):499–509

    Article  Google Scholar 

  • Li ST, Hammond JL (1975) Generation of pseudorandom numbers with specified univariate distributions and correlation coefficients. IEEE Trans Syst Man Cybern 5:557–561

    Article  Google Scholar 

  • Little RJA, Rubin DB (2019) Statistical analysis with missing data, vol 793. Wiley, New York

    Google Scholar 

  • Lübbecke ME, Desrosiers J (2005) Selected topics in column generation. Oper Res 53(6):1007–1023

    Article  Google Scholar 

  • Lurie PM, Goldberg MS (1998) An approximate method for sampling correlated random variables from partially-specified distributions. Manag Sci 44(2):203–218

    Article  Google Scholar 

  • Medaglia AL, Sefair JA (2009) Generating correlated random vectors using mixed-integer programming. In: Proceedings of the IIE annual conference. Institute of Industrial and Systems Engineers (IISE), 1759

  • Mildenhall SJ (2005) Correlation and aggregate loss distributions with an emphasis on the Iman–Conover method. http://www.casact.org/pubs/forum/06wforum/06w105.pdf. Part one of “The Report of the Research Working Party on Correlations and Dependencies Among All Risk Sources.” Casualty Actuarial Society Forum (Winter 2005)

  • Mitchell CR, Paulson AS, Beswick CA (1977) Effect of correlated exponential service times on single server tandem queues. Naval Res Logist 24(1):95–112

    Article  Google Scholar 

  • Moorthy K, Saberi Mohamad M, Deris S (2014) A review on missing value imputation algorithms for microarray gene expression data. Curr Bioinform 9(1):18–22

    Article  Google Scholar 

  • Morris JA, Gardner MJ (1988) Calculating confidence intervals for relative risk (odds ratios) and standardised ratios and rates. Br Med J 296(6632):1313–1316

    Article  Google Scholar 

  • Nasr WW, Maddah B (2015) Continuous (s, S) policy with MMPP correlated demand. Eur J Oper Res 246(3):874–885

    Article  Google Scholar 

  • Oracle (2019) Oracle(@) Crystal Ball reference and examples guide. http://www.hstoday.us/. Release 11.1.2.4, Accessed July 2019

  • Park CG, Dong WS (1998) An algorithm for generating correlated random variables in a class of infinitely divisible distributions. J Stat Comput Simul 61(1–2):127–139

    Article  Google Scholar 

  • Park CG, Park T, Shin DW (1996) A simple method for generating correlated binary variates. Am Stat 50(4):306–310

    Google Scholar 

  • Patuwo BE, Disney RL, McNickle DC (1993) The effect of correlated arrivals on queues. IIE Trans 25(3):105–110

    Article  Google Scholar 

  • Polge RJ, Holliday EM, Bhagavan BK (1973) Generation of a pseudo-random set with desired correlation and probability distribution. Simulation 20(5):153–158

    Article  Google Scholar 

  • Pouillot R, Delignette-Muller M-L (2010) Evaluating variability and uncertainty in microbial quantitative risk assessment using two R packages. Int J Food Microbiol 142(3):330–40

    Article  Google Scholar 

  • Qaqish BF (2003) A family of multivariate binary distributions for simulating correlated binary variables with specified marginal means and correlations. Biometrika 90(2):455–463

    Article  Google Scholar 

  • Reilly CH (2009) Synthetic optimization problem generation: show us the correlations! INFORMS J Comput 21(3):458–467

    Article  Google Scholar 

  • Rosenfeld S (2008) Approximate bivariate gamma generator with prespecified correlation and different marginal shapes. ACM Trans Model Comput Simul 18(4):16

    Article  Google Scholar 

  • Schmeiser BW, Lal R (1982) Bivariate gamma random vectors. Oper Res 30(2):355–374

    Article  Google Scholar 

  • Sefair JA, Méndez CY, Babat O, Medaglia AL, Zuluaga LF (2017) Linear solution schemes for mean-semivariance project portfolio selection problems: an application in the oil and gas industry. Omega 68:39–48

    Article  Google Scholar 

  • Sheskin DJ (2000) Handbook of parametric and nonparametric statistical procedures, 3rd edn. Chapman and Hall-CRC, Boca Raton

    Google Scholar 

  • Shin K, Pasupathy R (2010) An algorithm for fast generation of bivariate Poisson random vectors. INFORMS J Comput 22(1):81–92

    Article  Google Scholar 

  • Shults J (2017) Simulating longer vectors of correlated binary random variables via multinomial sampling. Comput Stat Data Anal 114:1–11

    Article  Google Scholar 

  • Sigler EA, Tallent-Runnels MK (2006) Examining the validity of scores from an instrument designed to measure metacognition of problem solving. J Gener Psychol 133(2):257–276

    Article  Google Scholar 

  • Stanfield PM, Wilson JR, King RE (2004) Flexible modelling of correlated operation times with application in product-reuse facilities. Int J Prod Res 42(11):2179–2196

    Article  Google Scholar 

  • Todd CR, Ng MP (2001) Generating unbiased correlated random survival rates for stochastic population models. Ecol Model 144(1):1–11

    Article  Google Scholar 

  • Touran A (1993) Probabilistic cost estimating with subjective correlations. J Constr Eng Manag 119(1):58–71

    Article  Google Scholar 

  • Touran A, Suphot L (1997) Rank correlations in simulating construction cost. J Constr Eng Manag 123(3):297–301

    Article  Google Scholar 

  • Toyoda Yoshiaki (1975) A simplified algorithm for obtaining approximate solutions to zero-one programming problems. Manag Sci 21(12):1417–1427. https://doi.org/10.1287/mnsc.21.12.1417

    Article  Google Scholar 

  • Van der Geest PAG (1998) An algorithm to generate samples of multi-variate distributions with correlated marginals. Comput Stat Data Anal 27(3):271–289

    Article  Google Scholar 

  • Wallis WA (1939) The correlation ratio for ranked data. J Am Stat Assoc 34(207):533–538

    Article  Google Scholar 

  • Xiao Q (2017) Generating correlated random vector involving discrete variables. Commun Stat Theory Methods 46(4):1594–1605

    Article  Google Scholar 

  • Yan C, Kung J (2016) Robust aircraft routing. Transp Sci 52(1):118–133

    Article  Google Scholar 

  • Young DJ, Beaulieu NC (2000) The generation of correlated Rayleigh random variates by inverse discrete Fourier transform. IEEE Trans Commun 48(7):1114–1127

    Article  Google Scholar 

  • Zhang Yufeng, Khani Alireza (2019) An algorithm for reliable shortest path problem with travel time correlations. Transp Res Part B Methodol 121:92–113. https://doi.org/10.1016/j.trb.2018.12.011

    Article  Google Scholar 

Download references

Acknowledgements

The authors would like to thank Professor Jim Wilson from NC State University for sharing his encouraging and valuable input at an earlier stage of this work. The authors sincerely thank Professor Douglas Montgomery at Arizona State University for his valuable comments to improve the manuscript. Also, authors thank Gurobi and FICO for providing access to their commercial optimization solvers under their academic licensing programs. The authors would like to thank the two anonymous reviewers, whose comments greatly improved the article. This material is based upon work supported by Dr. Sefair’s National Science Foundation Grant No. 1740042.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jorge A. Sefair.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 70 KB)

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Sefair, J.A., Guaje, O. & Medaglia, A.L. A column-oriented optimization approach for the generation of correlated random vectors. OR Spectrum 43, 777–808 (2021). https://doi.org/10.1007/s00291-021-00620-5

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00291-021-00620-5

Keywords